Wiki - ReuseLessSoftware

The title is kinda clickbait, sorry. You’ll see.

Last updated June 2026.

The problem

Basically, supply chain attacks are increasingly becoming a problem, not really because the nature of software or software maintenance has changed (though it has), but because the cost model for sharing distributing software has changed to make it really cheap. So cheap that we automate the shit out of it even when it’s wasteful, because the automation is useful. And so now every few months we get a new supply chain attack where someone manages to break half the code in the world or something.

How we got here

In the late 1960’s and early 70’s there was the “software crisis”, where people didn’t know how to make software they could reuse. They saw an exponentially growing demand for software, and sub-linear ability to make new software that was as complex as was being demanded. This lead to a lot of research into modularity, structured programming, and so on, and like all good research this more or less faded into the background of ambient knowledge such that people don’t really think about it anymore. Pretty much every single programming language module system made since 1990 can trace its lineage back to Modula-2, and that’s just how good module systems work these days.

In the 1990’s and 2000’s the internet as a whole built a more powerful solution to the software crisis though: Building and distributing software was now pretty cheap, and most software you actually want to use is open-source anyway, so we can just give away the ability to distribute source code to whoever wants it and let them build it themselves. You no longer always need to rely on a vendor to do all the integration work of building and distributing software for you, you can just do it yourself and have some volunteers donate a bit of storage and bandwidth as a public service. On the backs of CPAN and CTAN and Linux distros we now have a whole zoo of “package repositories” and “package managers”. These are often language-specific tools that take a manifest file and can can automatically search for, fetch and build software given nothing but a name and an (often ad-hoc) version number.

Before then the only good way to build complex software systems was to carefully and laboriously put together working pieces by hand, which is basically what Linux distros do. And as someone who spent days of my life trying to make SDL build with all its bells and whistles in 2003, it sucked, a lot. Do NOT pine for those days. But if you have a Linux distro as a known and not-too-bad base environment, it turns out that lots of custom software just kinda lives in its own world and doesn’t need to care about the rest of the system much. If it talks to other software, it talks over files or network sockets using well-known protocols. And so now we have oceans of software, really good software, that builds from scratch with Rust or Go, or is distributed as a docker container, or otherwise doesn’t really interact with “system libraries” much at all. Instead of bothering to try to talk to some set of software that’s been provided by an OS distribution, if they need a library their build system fetches it for them.

So now we have the opposite crisis from the 1970’s, where people reuse too much software and it makes their programs worse. It turns out that while distributing software is still incredibly cheap, using it has still has costs. For a long long time the greatest cost was the complexity of building software and getting it running on a computer at all, but we’ve largely automated that problem away. So now we build, distribute and use orders of magnitude more software than we used to, and the costs of doing that manifests in the form of dependency hell, bloat, long build times, packages or package maintainers vanishing into the ether, etc.

But there’s one really big problem: supply chain attacks.

Let’s be clear, supply chain attacks have been around as long as open-source software has. That one time back in the dark ages when someone tried to sneak in a patch to the Linux kernel with uid = 0 instead of uid == 0, and it was a big deal ’cause it was the first time a malicious kernel patch had been seen in the wild? That’s an attempted supply chain attack. However, they have gotten bigger and more problematic the last decade because we have automated build systems that fetch and distribute source code. Our CI systems tend to run on every code change¹, and then these code changes automatically become available to everyone who depends on them. Whose CI systems pick them up and incorporate whatever exploit was introduced, and so on. A good supply-chain attack spreads like wildfire, as fast as CI runners can execute it.

People have argued that one way to fight supply chain attacks is to slow them down, via dependency cooldowns or something like that. Maybe, but then people argue about policy and whose responsibility it is to be the guinea pigs, but what if we can do it more simply?

Proposed solution

Instead of having a build system like npm, cargo, etc automatically pull dependencies from some networked location each time you build your software fresh, just… include all the dependencies for your software, with your software.

That’s it.

Vendor the shit out of your project. Copy-paste upstream source control into your git repo and commit that fucker. Why not? Upstream update happens? Download it and copy-paste it again. Get sick of doing this by hand? Make the build tool automate it, that’s its job. You have a lockfile already, just make it correlate to the full source tree you have in source control. Own every line of source code with the iron fist of an absolute control freak.

Sure, this will bloat your git repo, but disk space is cheap. (Transfer is less cheap, but bear with me.) It will bloat your build times– wait, no it won’t, you were rebuilding all that shit anyway. It will make code reuse harder– well, for some programs this is true, such as clients and servers talking to each other using a shared protocol lib. But those programs already have version mismatch problems and need to need to handle them. Forcing them to actually pay attention to it won’t make their life worse in the long run.

So just by not updating dependencies automatically, you turn every single package in an ecosystem into a fire-break for supply chain attacks. Sure, it’s also a fire-break for propagation of bug fixes and patches, but let’s be real, if those matter then you’ll be looking for them manually anyway, and if you aren’t looking for them, they usually don’t matter.

(You could also get the same effect by ditching any concept of semver or other “these two different pieces of code should behave the same” in the build system, and treating every version number as unique and unrelated to any other.² But that doesn’t solve the problem of dependencies vanishing or otherwise being subverted, or someone tampering with the contents of a package in other ways. It’s an optimization, and in my mind a premature one; we might get there eventually but shouldn’t start there.)

This isn’t just about slowing down automatic changes, though I think that’s enough to make it worthwhile. I believe vendoring all your deps will also have a desirable soft side-effect: It will increase the cost of using dependencies. Not a lot, not in any irredeemable way, it will just require a little more thought when you use something from upstream. A gentle way to make you slow down a little and ask yourself “do you really need this? really?”. In addition to slowing down automatic changes, it increases the visibility of dependencies. It reduces how much bloat is hidden behind them, just ’cause it isn’t hidden. If you add a simple lib to your project that should be like 200 lines of code and discover, oh wait, for some reason it’s 50,000 lines, it’s more obvious that maybe you should stop and ask why. It also reduces the magic of dependencies, you can more easily chase bugs out of your codebase and into someone else’s. Hopefully vendoring everything by default should tend to encourage flatter and wider dependency trees, though probably not to the point of C++ mega-libraries like Boost and Qt. Those things exist precisely because making and using small C/C++ libs is so horrible, so you want to pack more functionality into one big lib. You really, really don’t want to have to figure out how to build those things yourself, you want a system integrator like a Linux distro to do for you so that it only has to be done once.

One real downside of this though is that transitive dependencies won’t get shared. If lib A depends on Z and lib B also depends on Z and you want to dedupe them, it gets harder. Not impossible, but harder, you need to do it by hand or need more sophisticated tooling to do it for you. This can be a problem. But it’s also a problem when transitive dependencies do get shared… or when you have them at all, really. That’s what this is all about. Letting a lib specify transitive dependencies is handing control over your program to someone else.

Analysis

Can all software do this? No, not really. I don’t think it’s especially reasonable to vendor all of Redis and build it as part of your webapp backend deployment.³ There’s a ceiling to the complexity this will tolerate, though companies like Google and Facebook(?) with giant monorepos demonstrate that this ceiling is probably a lot higher than you think. Additionally, at some point dependencies meet operating systems, and that’s a big ol’ fat dependency that has plenty of problems of its own.⁴ But hopefully most software can end up having at most 2-3 external dependencies that can change out from under them by surprise, instead of 200-300.

Additionally this isn’t a proposal for how to build a full interactive system like a Linux distro or BSD. That’s a different problem because you have lots of programs and libraries that need to work together. But, you know… you could do it that way if you tried. Taking this principle alllll the way gets you Nix/Guix or something like it, and I think that’s probably a good thing. The way we have the concept of a “build environment” that must be put together correctly is a very lazy and under-specified way of solving the problem of “how do we build software”, a relic of the days when software was built once on some minicomputer somewhere and then shared widely as binaries. We do a lot more software-building on the fly these days than we did in the 1970’s.

So it’s not a one-size-fits-all solution. But I still think lots of software can do this and benefit from it. Remember, most software is small, and large projects already have to solve a lot of these problems. There’s tons of libs out there that actually do pure computation, or which only really touch the world through very basic and portable I/O like files and network sockets. Just vendor ’em. Compression lib? Copy-paste that sucker. libcurl? Copy-paste that sucker. TUI lib? Copy-paste that sucker. Django? Copy-paste that sucker, why not. You’ll never⁵ get bitten by version conflicts or sudden patches introducing bugs that you only find when you deploy/build the software on a new system and have it mysteriously break.

Conclusions

I’m just an idiot with an opinion. Discuss.

Or at least every big one, like a PR getting merged.↩︎
The problem with semver in the end is that it expresses human intent, not actual reality, and then only when it’s used semi-correctly.↩︎
But if your deployment is automated with Ansible or docker images or something then you’re probably effectively doing something like this already.↩︎
I really like idea of a unikernel for web backend stuff, but there’s some real tooling issues around it and we just don’t seem to be there yet.↩︎
okay, never-ish.↩︎