Aggressive Attack on PyPI Attempting to Deliver Rust Executable

blog.phylum.io

148 points by iamspoilt 3 years ago · 106 comments

Reader

I understand that this is meant to be an eye-popping press release (and implicitly a product spotlight), but some of these claims make me gag.

It's not an attack "on" PyPI, or even an attack at all: someone is just spamming the index with packages. There's no evidence that these packages are being downloaded by anyone at all, or that the person in question has made any serious effort to conceal their attentions (it's all stuffed in the setup script without any obfuscation, as the post says). The executable in question isn't even served through PyPI (for reasons that are unclear to me): it's downloaded by the dropper script. Ironically, serving the binary directly would probably raise fewer red flags.

Supply chain security is important; we should reserve phrases like "aggressive attack" for things that aren't script kiddie spam.

agolio 3 years ago

The most "aggressive" part is that those sweet package names like "colorslib" are being stolen.
- komali2 3 years ago
  
  My biggest curiosity here is how they generated over a thousand package names ranging from feasible to interesting. I expected gibberish.
  Lol, maybe, "chatgpt, give me a thousand feasible pypi package names"?
  - praash 3 years ago
    
    The names seem to be simple concatenations of random parts like "game", "lib", "vm", "cv", "http".
    They do look surprisingly convincing.
- lelandbatey 3 years ago
  
  Thankfully, they're not actually being stolen because all the packages were already taken down; they're available for legitimate use again: https://pypi.org/project/colorslib/
  - abeyer 3 years ago
    
    While I think that _may_ be the right thing to do here... it's a bit worrying as recycling names like that has it's own share of risks.
asperous 3 years ago

I think it's a serious threat, especially with LLMs now because people can make believable packages at scale. Not everyone vets their packages thoroughly
- codetrotter 3 years ago
  
  Speaking of LLMs. Since LLMs like to hallucinate every now and then, an LLM could also hallucinate names of packages that it tells people to install. And those packages could in turn have been squatted by malware authors.
  And in this way, malicious packages may be unintentionally downloaded by users even when those malicious packages did not yet exist when the LLM was trained. Just because the hallucinated package name was randomly later taken by someone malicious.
  - freeqaz 3 years ago
    
    I've seen this effect get amplified also when somebody puts a "bad" answer in a public place like StackOverflow. It is possible to have quite a large blast radius from something like this!
  - kadoban 3 years ago
    
    An attacker could also try to get a list of packages that the LLMs halucinate, and squat on those.
- woodruffw 3 years ago
  
  You've always been able to make "believable" packages at scale. PyPI doesn't enforce uniqueness: you can crank out malicious near-duplicates of any package you please.
  - zeven7 3 years ago
    
    And, to parent's point, now LLMs will tell people to use them and they will[1].
    [1] https://news.ycombinator.com/item?id=34916682
    
    Groxx 3 years ago
    
    Stack Overflow and Google search results were already doing that though, at massive scale. I agree it changes things somehow, but people not thinking before acting is not a new problem.
- freeqaz 3 years ago
  
  I agree that it is a threat. I don't think this instance is (it's too noisy).
  I wrote a comment on the NPM thread earlier (https://news.ycombinator.com/threads?id=freeqaz) that I'll quote here:
  > "While being flooded with spam is never good, it gets immediately noticed and mitigated. It's harder for open source projects to spot and stop rare one-offs"
  This is the real problem that NPM and other ecosystems face. A determined attacker that is trying to "poison" a popular Open Source package just has to feign as a maintainer long enough to succeed[0]. Defeating these types of attacks will require rethinking how we think about trust of packages.
  Projects like Deno are one approach (fork the ecosystem) while projects like Packj (mentioned elsewhere here), Socket.dev, and LunaTrace[1] are taking the other angle (make it harder to install malware).
  It's hard to say which approach is better right away. (Probably a hybrid of both, realistically) It's just non-trivial to fix this in one clean swoop. It's messy.
  0: https://www.trendmicro.com/vinfo/us/security/news/cybercrime...
  1: https://github.com/lunasec-io/lunasec
- wheelerof4te 3 years ago
  
  Me, I just use the stdlib and my local packages.
  There's something beautiful in knowing you're using pure, clean Python. Much easier to install, also.
worik 3 years ago

No. This is very concerning.
Attacking a popular repository like this does not have to have a high hit rate.
"Script kiddie spam" is now computers get compromised. Unsophisticated mass attack.
This sport of thing, combined with woeful security and fragile systems are causing havoc the world over.

ashishbijlani 3 years ago

We’ve built Packj [1] to detect packages with install hooks, embedded binary blobs, and other such malicious/risky packages. It performs static/dynamic/metadata analysis to look for "suspicious” attributes.

1. https://github.com/ossillate-inc/packj

lrem 3 years ago

Why are these things riskier than the plain Python code you likely don't read, but go ahead and execute?
- ashishbijlani 3 years ago
  
  A number of academic researchers (including us) have studied malware samples from past open-source supply chain attacks and identified code/metadata attributes that make packages vulnerable to such attacks. Packj scans for several such attributes to identify insecure or "weak links" in your software supply chain (e.g., missing or incorrect GitHub repo, very high version number, use of decode+exec, etc.). Full list here: https://github.com/ossillate-inc/packj/blob/main/packj/audit...
- twodave 3 years ago
  
  It's relative, but I assume it's flagging for certain class of known malicious patterns. There's nothing stopping you from writing malicious python code, but essentially that script will only run while you expect it to in most cases unless it interacts with the OS in some way.
  It doesn't make plain Python code you blindly execute any safer, but at least you've explicitly given those packages your trust. I believe this is more geared toward detecting compromises of those packages you have given that trust.
  - colatkinson 3 years ago
    
    Packages can do weird things like auto-loading into the interpreter (example: [0]). So in a scenario where a malicious package has ended up on your machine, you're a bit screwed whether it's a .so or a .py. I believe that was the point OP was making -- a pure-Python wheel is not really any safer than a wheel with embedded binaries.
    [0]: https://github.com/pyston/pyston/blob/1d65d4831912179c26bb27...
    
    Godel_unicode 3 years ago
    
    It’s like tor though; everyone that’s malicious doesn’t do this but the ratio is much higher so be much more suspicious. Security isn’t about silver bullets, it’s about a compilation of tricks that make malicious things more obvious.
    
    ikekkdcjkfke 3 years ago
    
    I like how we assume that programs running with the full rights of the user is normal.

throwaway81523 3 years ago

Wonder if that is related to the malware spamming of NPM that I saw something about last night.

Python used to have a "batteries included" philosophy which tried to put most important stuff into the distro, reducing the number of external dependencies any given app needed. They seem to have abandoned that now, leaving us to fend for ourselves against the malware.

NPM spam: https://www.scmagazine.com/analysis/devops/npm-repository-15...

wheelerof4te 3 years ago

"They seem to have abandoned that now, leaving us to fend for ourselves against the malware."
Yes, along with reducing the stdlib and directing us to PyPI for "alternatives".
- throwaway81523 3 years ago
  
  Dumpster diving anyone? Npm always felt that way and PyPI is catching up.

belinder 3 years ago

How is the rust part relevant?

throwthere 3 years ago

Chatgpt recommended it for the upvotes.
- wheelerof4te 3 years ago
  
  "The most beloved programming language used to build and ship malware (PHOTO/VIDEO/NSFW)"
yabones 3 years ago

We like when malware is written in a memory-safe language.
- dtgriscom 3 years ago
  
  Encrypt my files, but please don't waste my RAM while you do so.
  - lost_tourist 3 years ago
    
    "it's amazing what they were able to do with that old pentium linux server that someone forgot in the server room, so much damage! rust is truly the future"
mftb 3 years ago

If a payload is native it's potentially more of a problem than a script. If the payload had been c or c++, I wouldn't have been surprised if they noted that either.
sidlls 3 years ago

We should only refer to Rust when it's included in positive events? How is it not relevant here? It was used to build executables to inject, likely for malicious purposes. Given its newness and all the other hype around it, I'd say it's very relevant.
- puffoflogic 3 years ago
  
  Why not also be sure to mention what OS was used for the build, and what linker, and what file format, and what model of computer was used, and what its default ui language was set to? What is so special about programming language used that sets it apart from those other factors that could be mentioned?
  - muraiki 3 years ago
    
    The use of Rust has particular implications for malware analysis: https://c3rb3ru5d3d53c.github.io/2022/08/malware-reversing-r...
    So yes, it's relevant.
    
    timeon 3 years ago
    
    I hope you do not often feed your knowledge with shallow and manipulative content like this one.
    
    muraiki 3 years ago
    
    Ok, here's a better article from CMU's SEI. See "Binary Analysis Without Source Code".
    > In general, the layout used by the Rust compiler depends on other factors in memory, so even having two different structs with the exact same size fields does not guarantee that the two will use the same memory layout in the final executable. This could cause difficulty for automated tools that make assumptions about layout and sizes in memory based on the constraints imposed by C. To work around these differences and allow interoperability with C via a foreign function interface, Rust does allow a compiler macro, #[repr(C)] to be placed before a struct to tell the compiler to use the typical C layout. While this is useful, it means that any given program might mix and match representations for memory layout, causing further analysis difficulty. Rust also supports a few other types of layouts including a packed representation that ignores alignment.
    > We can see some effects of the above discussion in simple binary-code analysis tools, including the Ghidra software reverse engineering tool suite... Loading the resulting executable into Ghidra 10.2 results in Ghidra incorrectly identifying it as gcc-produced code (instead of rustc, which is based on LLVM). Running Ghidra’s standard analysis and decompilation routine takes an uncharacteristically long time for such a small program, and reports errors in p-code analysis, indicating some error in representing the program in Ghidra’s intermediate representation. The built-in C decompiler then incorrectly attempts to decompile the p-code to a function with about a dozen local variables and proceeds to execute a wide range of pointer arithmetic and bit-level operations, all for this function which returns a reference to a string. Strings themselves are often easy to locate in a C-compiled program; Ghidra includes a string search feature, and even POSIX utilities, such as strings, can dump a list of strings from executables. However, in this case, both Ghidra and strings dump both of the "Hello, World" strings in this program as one long run-on string that runs into error message text.
    https://insights.sei.cmu.edu/blog/rust-vulnerability-analysi...
    
    puffoflogic 3 years ago
    
    That article is nonsense and the author could not even complete it. It is buried in a shallow grave of irrelevance.
    
    muraiki 3 years ago
    
    Here's an article from the SEI covering the problems with using Ghidra for malware analysis of Rust code. See "Binary Analysis Without Source Code" https://insights.sei.cmu.edu/blog/rust-vulnerability-analysi...
    
    puffoflogic 3 years ago
    
    You cite yet another article which you clearly don't understand, and whose authors have questionable understanding themselves.
    This article cites CVEs of a certain type, which were especially popular in the 2021 timeframe. These CVEs do not correspond to real vulnerabilities in real executables. Rather, they are reporting instances of rust programs violating the strictest possible interpretation of the rules of the rust language. For comparison, quite literally every single C program ever written would have to receive a CVE if C were judged by the same rules, because it isn't possible to write a C program which conforms to the standard as strictly as these Rust CVEs were requiring. CVEs of this nature are a bit of a meme in the rust community now, and no one takes them seriously as vulnerabilities. They are reporting ordinary, non-vulnerability bugs and should have been reported to issue trackers.
    The whole discussion about layout order is completely irrelevant. When RE'ing unknown code you don't know the corresponding source anyways, so the one-to-many correspondence of source to layout is irrelevant. You are given the layout. You can always write a repr(C) which corresponds to it if you're trying to produce reversed source. This is no different than not knowing the original names of variables and having to supply your own.
    The next objection is literally that rust does not use null-terminates strings, except the authors are so far out of their depth that they don't even identify this obvious root cause of their tools failing. Again, this has absolutely nothing to do with the reversibility of rust programs, except perhaps preventing some apparent script kiddies from figuring it out.
    The authors do manage to struggle to shore, as it were, by the end of the article, and somehow they end up correctly identifying their tools and understanding and the root cause of their troubles, not Rust. I take it you didn't make it that far when you read it?
baguettefurnace 3 years ago

just like if a tesla is involved in a car crash, headline must mention Tesla
- butterNaN 3 years ago
  
  Isn't that because sometimes the tesla software might be at fault?
  - baguettefurnace 3 years ago
    
    The software is rarely at fault if you follow up on the stories. If every time a human driver crashed a car it was the lead story in whatever news you consume, you'd likely never drive.
HL33tibCe7 3 years ago

It’s unusual for malware.
- j-krieger 3 years ago
  
  Yes, but only for the time being. I‘ve recently published a paper on the topic. Rust and Golang are getting immensely popular with malware authors.
  - hoppla 3 years ago
    
    I would love a link so I could read your paper.
    
    arthurcolle 3 years ago
    
    Same! Please share GP
- Godel_unicode 3 years ago
  
  I mean, not really? There’s a lot of legacy stuff written in other languages, but malware authors have realized that people are less skeptical of rust and are actively taking advantage of that fact.
hoppla 3 years ago

Because everywhere it’s used, it has to be explicitly mentioned, like clothing that use GoreTex™.
- abeyer 3 years ago
  
  The first rule of rust club is that you must talk about rust club.
  - anon4242 3 years ago
    
    At least we know the malware wasn't produced in Switzerland, because if it were, it would've told us.

almet 3 years ago

It's still the same story : PyPI still doesn't have a way to automatically detect interactions with the network and the filesystems for the submitted packages. It's a complex thing to do for sure, but that would be a welcome addition, I guess.

woodruffw 3 years ago

PyPI still doesn't have this because no packaging ecosystem does. It's impossible to do in the general case if your packaging schema allows arbitrary code execution, which Python (and Ruby, and NPM, and Cargo, etc.) allow.
The closest thing is pattern/AST matching on the package's source, but trivial obfuscation defeats that. There's also no requirement that a package on PyPI is even uploaded with source (binary wheel-only packages are perfectly acceptable).
- spenczar5 3 years ago
  
  "no packaging ecosystem does."
  This is a little bit too strong, since packaging doesn't require arbitrary code execution. For example, Go doesn't permit arbitrary code execution during `go get`. Now - there have been bugs which permit code execution (like https://github.com/golang/go/issues/22125) but they are treated as security vulnerabilities and bugs.
  Of course, you're right about Python.
  - woodruffw 3 years ago
    
    What I meant by that is that no packaging ecosystem (to my knowledge) runs arbitrary uploaded code to find network activity. Some may do simpler, static analyses, but outright execution for dynamic analysis purposes isn't something I'm aware of any ecosystem doing.
    Python, Ruby, et al. are in an even worse position than that baseline, since they have both arbitrary code in the package itself and arbitrary code in the package's definition. But the problem is a universal one!
    
    spenczar5 3 years ago
    
    Ah, yep, you're right about that as far as I know too.
- eigenvalue 3 years ago
  
  This seems eminently solvable though. Why can’t every package submission cause some minimal sandboxed docker image to install the package and call the various functions and methods and log all network and disk activity? If anything looks suspicious it would be denied and the submitter would have to appeal it, explaining why the submission is valid. The same applies for NPM and Cargo. I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start. This seems like the kind of thing that wouldn’t even cost all that much, and big corporate users of python would stand to benefit.
  - woodruffw 3 years ago
    
    For one, because Docker is not a sandbox, and containers are not a strong security boundary[1]. What you really need here is a strongly isolated VM, at which point you're playing cat-and-mouse games with your target: their new incentive is to detect your (extremely detectable) VM, and your job is to make the VM look as "normal" as possible without actually making it behave normally (because this would mean getting exploited). That kind of work has a long and frustrating tail, and it's not particularly fruitful (relative to the other things packaging ecosystems can do to improve package security).
    > I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start.
    You're probably talking about Moyix, who did indeed downloaded every package on PyPI[2], and unintentionally executed a bunch of arbitrary code on his local machine in the process.
    [1]: https://cloud.google.com/blog/products/gcp/exploring-contain...
    [2]: https://moyix.blogspot.com/2022/09/someones-been-messing-wit...
    
    eigenvalue 3 years ago
    
    You make some good points. But it still seems to me that, if you used the best available sandboxed VMs for each platform (Windows Sandbox for Windows; FireJail for Linux; VirtualBox with no folder permissions for OSX-- I don't know if these are the best or even good, those were the ones I found from a bit a searching), that you could install and run these packages in an automated way (especially with some GPT3-type help to figure out how to explore and call the important functions) and look for the telltale signs in the network and file access behavior that they are malicious. Even if we grant that this is a long-tailed "cat and mouse" game, then so what? We won't get 100% security, especially against super sophisticated threat actors, but if you could catch 98% or whatever of the typical clumsy supply chain attacks, or super egregious stuff like that NPM package that deleted your whole disk if you were Russian, that would be an incredibly vast improvement over the current state of affairs. Why isn't that worth doing? Why isn't Google or Microsoft at least trying this?
    
    woodruffw 3 years ago
    
    It isn't worth doing because the equation you've supplied doesn't include the effect of catastrophic failure: dynamic analysis lowers the barrier for exploit to a single hypervisor or VM exploit. Catching 98% of spam packages that affect nobody is worth very little when the 2% you don't catch are the ones that do the real damage.
    > Why isn't Google or Microsoft at least trying this?
    They are: Google and Microsoft both spend (tens of) millions of dollars on hypervisor and VM isolation research each year. It's a huge field.
    
    com2kid 3 years ago
    
    > What you really need here is a strongly isolated VM,
    Simplify, don't use a VM.
    Create an isolated network, hook your sacrificial machine up to it, have it install the package. Remotely kill it (network controlled power switch if needed). The machine's hard drive should be hooked up through a network controlled switch of some type. After the sacrificial machine is powered down, reroute the HD so it is connected to a machine that does forensics.
    Now you have a clear "before" and "after" situation setup for analysis.
    The sacrificial machine's network activity can be monitored by way of whatever switch/router it uses to connect to the Internet.
    
    woodruffw 3 years ago
    
    This is a VM, but flakier and with more steps! It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about. I’m
    
    com2kid 3 years ago
    
    Doesn't it solve VM sandbox escape problems though? Actual physical hardware isolation, along with an isolated network. Code can't detect it is running on a VM if there isn't a VM, and it sure can't escape the sandbox if there isn't a sandbox.
    > It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about.
    I started my software engineering career in testing before VMs were a thing, so large, very large, scale test setups like the one I outlined were common place. I wrote about some of my experiences at https://meanderingthoughts.hashnode.dev/how-microsoft-tested... and the physical hardware setup my team was using to run (millions of!) tests was tiny compared to what other teams in Microsoft did at the time.
    Network controlled power and peripherals were exactly how automation was done back in the day. Instead of VM images, you got a bunch of identical(ish) hardware and you wrote fresh images to hard drives to reset your state.
    Are VMs more convenient? Sure, but my reply was in context of ensuring malware can't detect it is running in a VM!
  - nodogoto 3 years ago
    
    Well some calls absolutely should invoke network or disk activity, so you would additionally need to define what constitutes good and bad activity for each. Moreover unless the package is a collection of pure functions it would be easy to hide the malware trigger in state that won't be initialized properly by the automated method calls but would be in the standard usage of the package.
- blibble 3 years ago
  
  > It's impossible to do in the general case if your packaging schema allows arbitrary code execution
  Java's type system: ClassLoaders plus SecurityManager was impossible?
  that's literally how Java applets worked, enforced through the type system
  https://docstore.mik.ua/orelly/java-ent/security/ch03_01.htm
  yes, SecurityManager was a poor implementation for many reasons, but it's definitely not "impossible" to sandbox downloaded code from the network while having it interact with other existing code, you can do it with typing alone
- almet 3 years ago
  
  I'm not sure it's not do-able, actually. What about having an execution sandbox and a way to check the calls made during the execution of the install script for instance?
  I worked a few years back on something like this but it went nowhere, but I still believe it would be doable and useful. The only trace I found back is https://wiki.python.org/moin/Testing%20Infrastructure, which contains almost no info...
photon12 3 years ago

Smart attackers are already/will add `sleep(SOME_NUMBER_LONGER_THAN_SCAN_SANDBOX_LIFETIME)` before anything that does FS or network access. Not to say that this wouldn't be a welcome addition, but the scanning needs to be understood in the context of the inherent limitations of large scale runtime behavior detection of packages when you have a fixed amount of hardware and time for running those scans.

blibble 3 years ago

why does pypi/pip still not have namespacing?

Maven sorted this out 20 years ago

what's a bit sad is the python packaging's authority survey from a few months ago seemed to be mostly interested in vision and mission statements

rather that building a functional set of tools

woodruffw 3 years ago

Namespacing is not a security boundary: it's a usability feature that helps users visually distinguish between packages that share the same name but different owners. I don't think it would meaningfully affect things like package index spam, which this is.
(This is not a reason not to add namespacing; just an observation that it's mostly irrelevant to contexts like this.)
- blibble 3 years ago
  
  obviously, but it allows delegation of trust onto other systems (like the DNS)
  example: the package named "aws" on pypi was created by some random guy and has been abandoned for years
  if pypi/pip supported namespacing that would be info.randomdude.aws instead
  and amazon's packages would be under com.amazon
  not being able to namespace internal packages is another security issue that is substantially improved with proper namespacing
  to be blunt: not supporting it at this point is reckless and irresponsible
  (I note you're part of pypa!)
  - woodruffw 3 years ago
    
    DNS isn't a particularly secure root of trust; Java is somewhat unique among package ecosystems for picking it as their trust anchor.
    It also just kicks the can down the road: Amazon is the the easy case with `com.amazon`, but it isn't clear a priori whether you should trust `net.coolguy.importantpackage` or `net.cooldude.importantpackage`. These kinds of trust relationships require external communication of a kind that package indices are not equipped to supply, and should not attempt to solve haphazardly.
    > (I note you're part of pypa!)
    I am a member of PyPA, but I don't represent anyone's opinions but my own. It's a very loose collection of projects, and it would be incorrect to read a general opinion from mine.
    
    charrondev 3 years ago
    
    I will note even namespaces for package management that don’t use DNS are a big step up over none.
    For example in PHP/composer/packagist and node/npm they just have a vendor name that can be reserved.
    It makes it very easy to distinguish “this package is from the (trusted vendor name here)” and prevents issues with namesquatting.
    
    blibble 3 years ago
    
    > Amazon is the the easy case with `com.amazon`, but it isn't clear a priori whether you should trust `net.coolguy.importantpackage` or `net.cooldude.importantpackage`
    this is a classic example of not letting perfect be the enemy of good
    there is no perfect solution, there never will be
    piggybacking off of DNS works extremely well for Java and Go (and the tooling is a pleasure to work with)
    meanwhile Python continues to be a complete disaster
    
    woodruffw 3 years ago
    
    I agree there is no perfect solution. But I want a good solution, and I disagree that DNS is a good one.
    
    blibble 3 years ago
    
    I look forward to another 20 years of no progress!
    
    woodruffw 3 years ago
    
    Your cynicism isn't warranted: we've made significant improvements to PyPI over the last 4 years[1][2], and I'm currently working on additional features that will make secure publishing to PyPI easier[3]. We're also working on a codesigning implementation for PyPI, based on Sigstore[4].
    Security needs to be evidence and outcome-driven, first and foremost. That takes a while, but improved outcomes make it worth it.
    [1]: https://pyfound.blogspot.com/2019/06/pypi-now-supports-two-f...
    [2]: https://pythoninsider.blogspot.com/2019/07/pypi-now-supports...
    [3]: https://github.com/pypi/warehouse/issues/12465
    [4]: https://www.sigstore.dev/
    
    blibble 3 years ago
    
    > That takes a while, but improved outcomes make it worth it.
    meanwhile the integrity of the supply chain continues to be compromised
    > Your cynicism isn't warranted
    it is: the python packaging situation is worse today than it was when I started writing Python in 2005
    the legions of meetings, grandiose titles, conferences and mountains of unreadable proposals have produced tooling that is objectively worse than what Maven offered close to two decades ago
    
    woodruffw 3 years ago
    
    In 2005, PyPI didn’t even host packages. It was an index that pointed you to the HTTP-only host that served the distribution. As far as I know, even basic hash checking wasn’t added until a decade later.
    I have no opinions about titles, etc. But saying that Python packaging was better in 2005 is incorrect along all axes.
  - georgyo 3 years ago
    
    I like the way golang handled this. Imports are the URL to the resource. No central distribution mechanism at all. In the past few years they implemented a optional catching layer so you a dependencies going offline doesn't necessarily mean that it unavailable anymore.
  - dpedu 3 years ago
    
    Who's to say mr randomdude won't claim com.amazon first?
    
    sophacles 3 years ago
    
    Let's encrypt solved this by doing a proof of control over the domain name, and in an automated way.
    Pypi could do this. Or, they could require that someone demonstrate proof of ownership for a namespace by signing it with a certificate tied to the domain name (so you couldn't claim the com.bigco namespace without having the certs, which you can't get without owning that domain). There could even be signature requirements/proof for each package and/or version uploaded.
    
    dpedu 3 years ago
    
    I would need to spend money to purchase a domain and some kind of server before I can publish a python module? That doesn't seem right. And I presume I would need to keep paying for it as long as I want my modules available and verified. Attaching required monetary purchases to an open source ecosystem is not a good idea.
    
    sophacles 3 years ago
    
    Supporting namespacing does not preclude having the old system too. Or from having a public repo namespace like org.pypi or whatever that allows people to upload packages to the current repo using the system they currently have. Might help sort out some of the other packaging problems too - LWN had this the other day: https://lwn.net/SubscriberLink/923238/d48af5401c04db7d/ . Maybe it would help with the integrator notion org.conda or whatever.
    Depending on how something like this is implemented, maybe com.github could set it up to pull straight from the project repo.
    Just because there's ways it could go poorly, doesn't mean it will go poorly.
    
    natpalmer1776 3 years ago
    
    Well, in theory you could have a namespace schema that differentiates between user-submitted and organization-submitted packages such that randomdude's would appear as 'public.randomdude.aws' and organization-owned namespaces verified by a DNS record would appear as 'com.amazon.aws'
    
    dragonwriter 3 years ago
    
    You could in principle do proof-of-ownership checks like Google does for things like Webmaster Tools, so you’d need to control a domain to have thr corresponding namespace.
    
    pphysch 3 years ago
    
    It's much easier to correct the ownership of a single namespace than N packages in the global namespace
- Riverheart 3 years ago
  
  It can be if you implement it to be so. Just let people create an allowlist of approved vendors for their organization or project from those namespaces. This handles not having to approve individual packages from trusted entities like Google, Microsoft, etc. Update the list when new vendors are needed. Reuse elsewhere as necessary.
  Maybe the list can be hosted on an internal server for other employees to reuse. Hosting all the packages internally is overkill. Trusting the world by default is overkill.
  Now "pip install gooogle/package"
  "Hey User, gooogle/package is not from a trusted namespace. Did you mean google/package which is similar and trusted? Or would you like to add gooogle to your local trust file?"
  The lack of any kind of curated feeds that only lists verified or popular packges is tragedy. There should be a reasonable way of allowing clients to protect themselves from a typo.
- pphysch 3 years ago
  
  Namespacing is a lot more than just a theoretical name collision avoider.
  Good namespacing (e.g. in Go), in practice, provides critical context about the development/publication of a software package.
djbusby 3 years ago

Every lang-ecosystem needs to re-implement CPAN the hard way.
klhanb 3 years ago

That's their calling card. Long discussion threads, mails spanning whole pages, silencing opposition.
But deliver anything more streamlined and secure? Hell, no!

lelandbatey 3 years ago

Interestingly, all the packages, even the ones from today, have all been taken down. So too have all the files that were being hosted on Dropbox.

photochemsyn 3 years ago

Wow this site runs a lot of JavaScript, speaking of aggressive data collection.

https://blog.hubspot.com/website/data-mining

ianai 3 years ago

Even animals in the wild agree to peace around the watering hole.

readthenotes1 3 years ago

Someone forgot to tell the crocodiles...
https://journals.plos.org/plosone/article?id=10.1371/journal...
eternalban 3 years ago

You forgot some of those animals have fangs.
It's like NYC's side walks. Compare pedestrian behavior at say SoHo (daylight) and say LES (nighttime). Amazingly enough, the partying and inebrieted pedestrians at night all file politely in the correct bimodal L|R formation. During the day, it's a rather wild and somewhat uncivilized dynamic slalom formation. My theory: Fangs. The night creatures know someone potentially dangerous maybe in the midst.
MadSudaca 3 years ago

The problem is assuming we’re better than animals.
- ianai 3 years ago
  
  I was rolling the shame or guilt card.
timeon 3 years ago

By animals you mean something like mammals?

steponlego 3 years ago

Yet another attack that requires the biggest malware vector, MS Windows. LOL

puffoflogic 3 years ago

I read TFA three times and I still have no idea what they meant by "Rust stage 1 executables".

In these cases I frankly assume that they don't either.

fortran77 3 years ago

But I thought Rust was supposed to be safe?!

Settings

Aggressive Attack on PyPI Attempting to Deliver Rust Executable

Keyboard Shortcuts