Post-mortem of this weekend's NPM incident
blog.npmjs.orgThis is a good post-mortem with clear, policy-based remediations. Nicely done.
I wonder why they are only preventing republishing for 24 hours. Is there a good reason to allow a package namespace to be recycled with less than, say, a week? Is it based on the assumption that the only case where it comes up is during an incident, and 24 hours is enough time to assume an incident will be resolved? I'm curious what went in to that number.
Why allow namespace recycling at all? The potential harm is high and the potential benefit is some slight convenience.
If npm packages used a Github-style "author/package" format, name collision would never be an issue again.
When your code deployment model is effectively "download stuff from random websites", I feel like namespace recycling is the least of your worries.
(That is to say, trusting that any given named package that `npm install` downloads is what you think it is is really no different that trusting `wget https://example.com/thecode.tgz`. Even if you verify that the domain hasn't switched hands, you have no guarantee that the author's pipeline wasn't compromised, or that the author didn't add malware themselves. There's a reason Debian, Red Hat et. al. put a lot of effort into ensuring integrity of their repositories.)
>If npm packages used a Github-style "author/package" format, name collision would never be an issue again.
They have that, and many are finally starting to take advantage of it (with babel being the most prominent with their latest version)
But this doesn't completely "fix" the problem, since the exact same conflicts can still happen with the "author" name (if someone takes "google\" there are going to be some very upset californians)
It does fix the problem, although consumers of the package need to first verify they trust the original author.
For example, if Google published "Google/polymer", dependents would be unaffected if a new package was published from "badguy/polymer". I think you're saying it's still not fixed because someone could register the Google author name before the real Google gets it, but that is a understood problem across every website.
> (if someone takes "google\" there are going to be some very upset californians)
Not a problem at all. They will just get the name by force. It already happened before, google "npm kik".
Yeah, but it's still a "problem" in that it technically doesn't change anything from the current system.
The issue isn't people assuming that every alias on the internet is truthful, very few people (hopefully) do.
The problem is a package URI being reused without me realizing, and that problem would be solved.
>But this doesn't completely "fix" the problem, since the exact same conflicts can still happen with the "author" name (if someone takes "google\" there are going to be some very upset californians)
The idea is that the author name maps to an actual repo and should serve as a second layer of authentication outside of NPM's control. The namespace should belong to the repository owner.
No one should be able to "take" google/ since the actual Google already owns github.com/google.
That would make a connection between Github and NPM. What if I want to use Bitbucket?
I'm suggesting NPM integrate Github's authentication API into their publishing process, to validate that the author owns a namespace if NPM doesn't already do that (I don't know, I don't use NPM.) If Bitbucket allows the same, then that's OK. The point is to have authors control namespaces, and NPM control package names but only validate namespaces. If namespaces are just another string that NPM controls and that can be handed out, revoked or taken over like package names then using them doesn't solve the intended problem at all.
Since NPM uses a centralized registry, though, some limitations would have to be imposed, and allowing multiple sources would mean allowing for conflicts between github:google/package and bitbucket:google/package. Ideally, namespaces should be unique and canonical, and the simplest way to accomplish that is just to allow publishing to a single source.
Keep a git mirror of your Bitbucket repository on GitHub.
Seems like they're only allowing name reuse in the case of spam packages. Not allowing name reuse in that case might result in lots of names being rendered permanently unusable by automated spambots.
Assuming no actual users are depending on packages which are literally just spam, I don't really see an issue with reusing the names of those packages.
You already sound smarter than whoever leads the Node Package Mess.
I'd like to see a package registry with (1) Github-style author-namespaced packages, and (2) package signing (i.e. if an author starts signing packages with a different key, I'd like to know about it). Maybe integrate the latter with Keybase to help users decide if they should trust a key.
I don't know how you gain any kind of critical mass trying to compete against a well-established registry like npmjs, though.
Package signing by the author helps, but not if (a) the author's key is compromised, or (b) the author acts in bad faith. With npm-style package registries, the risk of such an event increases with the number of dependencies your project has.
Better to (a) ship your code with explicit version and content dependencies (e.g. a SHA-256 of the package you expect to exist), or (b) delegate trust in upstream packages to fewer, more well-vetted, authorities (i.e. distro-style).
(1) has the consequence that names are associated with the original creator long after they may no longer be involved.
Package signing I can't tell if it's a good idea or not. It would require you to accept a new key in this case, but that is going to happen super frequently with NPM, inducing alert fatigue, and it doesn't do anything to prevent the authors' publishing credentials getting stolen.
(1) big projects just get moved over time.
e.g. philipwhiuk/gnome moves to gnome/core and there's a version published to philipwhiuk/gnome that pulls in gnome/core
I'd also like a bunch of checks applied to my use of that registry.
1. only packages > nK downloads 2. only packages with tests 3. only packages with publicly available code that is same that I am downloading 4. only packages with > nnn stars, forks, issues, pull requests
All are just proxies for being a valid node in a dependency graph.
And like to know if any transitive dep violates my inclusion rules.
I think package URIs should include a secure hash of their contents.
Although you won't get updates without asking for them - I'm not sure that's a bad thing - you can be assured that you'll either get the package you were expecting or no package at all.
They allow deletion of packages for 24 hours without staff involvement, there is nothing said about a time limit on republishing after deletion?
From the response steps:
> Our first action, which began immediately after the incident concluded, was to implement a 24-hour cooldown on republication of any deleted package name.
But they also have a policy of replacing deleted package names with placeholders, which was not currently enforced for spam-deletions. The cooldown should give them 24 hours to discover any remaining holes in that policy.
Oops, missed that part. Guess the logic is that for spam-packages, there isn't going to be anyone relying on them and it would be a waste of namespace space to allow spammers to fill it up?
Yes, and there are dozens of spam packages registered (and deleted) every day.
So, a spammer uploaded something containing copied data from a legitimate user and npm deleted everything from that user. Oy.
Seems like npm might want to review the policy that allows stuff like that to happen.
Even if a user violates the spam policy (which, to be clear, it seems the affected user in this case did NOT do), that hardly seems to be appropriate grounds for deleting everything the user has ever published on npm.
That is a policy that is just begging for griefing.
> Seems like npm might want to review the policy that allows stuff like that to happen.
That's one of the things the post mentions as what they are doing.
Are "joe jobs" still a thing?
https://en.wikipedia.org/wiki/Joe_job
> A joe job is a spamming technique that sends out unsolicited e-mails using spoofed sender data. Early joe jobs aimed at tarnishing the reputation of the apparent sender or inducing the recipients to take action against them [...]
Yep. I had one against me mid last year.
It wasn't a policy it was a spam heuristic
I meant the policy which allowed this to happen:
`In the course of reviewing and acting on spam reports, an npm staffer acted on this flag without further investigating the user and removed the user and all of their packages from the registry.`
Specifically, a policy that allows removing "all of [a user's] packages" based on something related to the user rather than on the packages themselves.
Feels like there should be a disconnect between decisions made about a 'user' and those made about a 'package'.
Once the package is published, there should be an understanding that the package belongs to npm and npm's users, even if the original publisher retains some authority over it.
And if there is cause to ban a user, it should not automatically mean that packages published by the user are affected (aside from removing whatever authority the user had).
> we have policies against ever running SQL by hand against production databases—but in this case we were forced to do so
Uh... Add in the fact that staff are now trigger happy, since a single button can do a lot of damage.
Our first action, which began immediately after the incident concluded, was to implement a 24-hour cooldown on republication of any deleted package name.
Why not infinity hours? I don't get it.
If it's a spam package that gets deleted, that would mean you'd quickly run out of available names.
Why can't they just reuse when it is apparent the case is harmful (as in, people complain and check number of downloads and dependent packages) by blocking the name and disallow reuse for any other cases?
Hmm, I'm not sure if I can follow your question, but I'm guessing that they're already planning to do what you want?
> Our first action, which began immediately after the incident concluded, was to implement a 24-hour cooldown on republication of any deleted package name
I don't understand this. Why hard delete packages at all? Soft deleting feels like it would be easier and would stop people republishing with the same name.
They could also bake their warning process for dependent libraries (i.e. "this package is gone!") into the soft delete process.
I feel like a project that could help with this to identify package importance by the dependents and downloads.
I think I might actually try this out.
> At the time of Saturday’s incident, however, we did not have a policy to publish placeholders for packages that were deleted if they were spam.
I see this acknowledgement, but I cannot find where they will remedy this by putting placeholders in place of spam removals. As a concession, maybe only placeholders for spam removals of packages that are older than X days or depended on (explicitly or transitively) by X packages. Did I miss where the remedy for this spam-removed-package-reuse was in the blog post?
They have added a 24hour re-publishing cooldown for all package removals regardless of reason. Exceptions are made for the original publisher and npm staff.
Explained somewhere near the bottom of the post, basic rational is that it gives them time to notice fuckups and fix them.
This does not alleviate the issue where you can reuse package names. I suppose they believe what they mark as spam packages won't be used enough or is already bad enough that reusing the name is harmless. And they probably also believe that they can catch fuckups in a day. I don't think either are necessarily true and are only true in this case because it hit popular dep trees. But what happens when something is erroneously marked as spam that's not as popular and the downstream dependents don't realize in 24 hours? If the problem is that "placeholders" are too heavy, then they could be made lighter weight or put some rules around when they will add them and when they won't.
My guess is that the cost of the placeholders is indeed what is driving their decision, though perhaps it is a premature optimization.
Maybe they've had situations where a spammer has created a very large number (millions, billions?) of packages. It's possible that the majority of user submissions are automated spam from botnets. I would assume npm has some mitigations in place to prevent this abuse in the first place, such as rate limiting and captchas, though maybe that's not enough to stem the tide.
Though, given that they say they have humans doing the package deletion, that makes me think that the number of spam packages created can't be that high. Certainly not high enough to outweigh the risks of package name reuse. Increase your prices a few pennies a month so you can afford to store the placeholders forever.