Press enter or click to view image in full size
All code is tech debt. All of it. It is baffling to me that this even needs to be said in 2023, but it does. In this essay I will explore the nature of tech debt and why all modern software is tech debt.
By the end. I will hopefully have convinced you that all code is tech debt and the easiest way to obliterate your organization’s long-term productivity is write lots of lines of code that come with low ROI.
The topic of how to choose high ROI features is one for another day (and it’s covered very well in the book Escaping the Build Trap).
This article is about the damage that low ROI features do to your codebase and your company’s future.
Different Kinds of Tech Debt
When it comes to financial debt, there’s really only two kinds:
- Unproductive debt, where the money is invested into an asset that yields less than the debt’s interest rate. An extreme example of this is taking out credit card debt to go to a fancy night club in Miami (or doing something even dumber like spending the $20k on having a dev agency build you a new social media MVP). But unproductive debt need not be so extreme. Taking on $50k of credit card debt at 29% to buy 8% corporate bonds is unproductive too.
- Productive debt, where the money from the debt is invested into an asset that appreciates or cash flows at a rate that is higher than the expected, risk-adjusted rate of return for the investment. If that’s too financey for you, here it is in plain words: paying 12% rates to invest in a stock market index fund is a bad investment even if after the fact you ended up making 13%. The expected return of a stock market index fund is pegged to the S&P500 is less than 11.88%, so when you take on debt to make this investment, the expected return should be higher to make up for the risk of the debt AND the risk of the investment.
In software terms, productive debt is debt that has a positive ROI in helping you meet some OKR like number of users or actual money you’re making. Unproductive debt is debt that does not. Both of these situations presume that you’re stepping into the debt with prior knowledge that you’re doing so, as is the case with financial debt.
When it comes to technical debt, there’s an additional kind: unavoidable debt. The surface area of code you have is directly proportional to the magnitude of unavoidable tech debt you will introduce in the future. This is the main thesis of this article.
Unavoidable Tech Debt
There’s tech debt that arises merely due to the passage of time. Below is a list of examples of how this unavoidable tech debt might arise that is nowhere near exhaustive. In fact the number of examples are essentially infinite, below is merely a small taste.
- Architecture decisions which were good then, are no longer good now. I’ve seen dozens of times in the past. A monolith is almost always the right choice until it isn’t — when you have 120 people working on it at the same time, clogging each other in CI and generating merge conflicts with one another. Or when PWC tells you your billing database doesn’t meet SOC2 and you can’t IPO until you split up your billing subsystem from your main monolith. Well guess what, the more code you have, the more code you need to migrate, the harder it is going to be to change your architecture.
- Breaking updates for packages. These happen super often in immature ecosystems (eg. Rust) and in un-pragmatic ecosystems (eg. anything related to frontend development, javascript, and anything Dan Abramov touches). Have you ever tried to update smithy-rs? React router across major versions? Puppeteer? Django? Cloudscape? Your favorite charting library? React itself? Think about when they finally deprecate class components. Well guess what, the more code you have, the more code you need to migrate, the harder it’s going to be to update your dependencies when they ship breaking changes.
- Abandoned/deprecated libraries. Open source is a brutal slog and many people get into it not knowing wtf they’re signing up for. Libraries get abandoned… all. the. time. Formik was once the most widely used forms library for react — it’s now abandoned. Enzyme was once a leader in React testing, and it’s now abandoned too. Guess what? The more code you have that uses a deprecated library, the harder it’s going to be to update your codebase and rip out an abandoned library in favor of a new alternative.
- Changing paradigms. Your company might change from serving frontends with a server fleet to serverless. or vice-versa. The testing paradigm for your framework of choice might change (as it has happened with React, multiple times, leading to us devs literally needing to rewrite every single one of our unit tests). This last one is a big risk because frontend testing still sucks. Because it still sucks, it’s still in flux. Every couple of years someone has figured out something better to do, leading to every app needing to rewrite every unit test. I would be surprised if this didn’t continue for the foreseeable future. Guess what? The more code you have that uses an outdated paradigm, the harder it’s going to be to update your codebase to the new paradigm. Changes in testing paradigms require rewriting most of your tests or living with outdated tools with less and less support as time goes by.
- Latency/performance. Of course, with more features comes more code, more translation strings, and often more dependencies. All of these run counter to latency goals you might have. If you or your business stakeholders at some point decide latency is important, then you’ll have to make code changes all over the place to improve latency. The more code you have, the harder it will be to meet latency goals.
Unavoidable Tech Debt is Non Linear Over Time
For all the above situations, there is an aggravating factor: for large projects, the calendar length of a project scales faster than the theoretical dev days required for it.
When you have a project that you think will take 3–5 days, it’ll usually take 3–5 days. But if your project is something like upgrading a codebase of 40k lines from React 15 to 18, it is not only difficult to estimate, but impossible to execute in a predictable time frame. The amount of calendar days it’ll take will far exceed the raw number of dev days it takes, due to constant interruptions that result in stops and starts.
Additionally, there’s all the following risks:
- Riskier releases: upgrading to a new major version of your main client side library or your server side framework will come with tons of code changes. You’ll have to thoroughly test everything.
- Version control: many projects will not offer a partial update path such that you can incrementally perform the update. Thus, you’ll have to create a long-lived feature branch and keep rebasing it over months, which will add a lot of work to the already long and difficult project.
- Dependencies complexity: almost invariably, updating a core component of your codebase causes you to have to go on a cascade of other updates to other libraries so they’re compatible with the main one you wanted to update. This adds to the version control risk and makes it almost a certainty.
- Code bloat and poor performance: in the cases where the tool you’re upgrading does offer a path to partial updates (for example, you can keep using version 2.2 of library x while you upgrade components one by one to version 3.0), this leads to a temporary state that has significant code bloat (you’re including both version 2.2 and 3.0 of the library, along with their respective dependency trees). This can damage performance and user experience and introduce bugs that are difficult to detect.
- Resource allocation: every time the main engineer in charge of one of these multi-month projects leaves your team or your company, the project basically resets. One of the longest and most painful projects I’ve ever been a part of was deprecating a thick proxy layer in favor of a thin one, and this project went through something like 4 management changes. It lasted well over three years.
- Timing risk. It’s possible that your company wants to have a hard push for performance and lowering latencies, but your core library also deprecated a bunch of features and needs to be updated to a new version. Now you can’t meet your org’s latency goals and you’re gonna look like crap, all because you kept saying yes to low-ROI features from PMs every 2 weeks. They added up to the point where they bloated your codebase by 50%.
The Goal
The goal of whatever product you’re building is to make money. Plain and simple. Do not forget this.
This is a difficult task. It’s especially difficult if you’re operating in a new space. Although the task of finding product market fit is difficult, it is simple to conceptualize: you are looking to create a product that people want to “hire” to do a “job” — as in Clayton Christensen’s jobs to be done framework.
Although I can’t possibly give you a bottled prescription for how to identify a “job to be done,” I can sure give you a prescription to create failure:
- Build an initial set of features to launch an MVP
- Invest all your effort into refining and adding more details to those features
The Consequences of the Build Trap on the Tech
Everyone knows the consequences of the Build Trap (aka product death cycle) on your product — you will end up with a very bloated product that does many things, but customers don’t want to hire your product because it doesn’t really do any “job” well, and it’s cluttered and hard to use. It’s just a haphazard collection of features. Once this happens to your company, you’re basically completely screwed, because even if you do add a new “job”, it’ll be buried under a mountain of mish-mash features and so the product won’t have a good experience.
But the effects of the product death cycle on the engineering side of things is seldom explored or discussed. The lesson to understand is this: Code is like physical infrastructure (roads, bridges, etc.). It wears out over time and requires maintenance. In a green field project, it’s extremely easy to build, build, and build way past the point that is sustainable for the dev team’s size. This doesn’t catch up to you til much later, and when it does, it puts the business in a very precarious position.
What will eventually happen if you consistently build low ROI features before discovering an effective “job to be done” is that you will box yourself out of ever finding product market fit. You will write enough code to keep your staff busy maintaining it in perpetuity, never being able to add any features without taking on ever more tech debt and risk. You’ll have to grow your staff to even be able to ship anything, but you won’t be able to grow your staff because you’re not making any money.
So Why Does Nobody Talk About This?
That’s probably a subject for an article in its own right, but I think there’s a couple of reasons:
- Over-specialization. Nobody can see the forest because we’re all looking at trees. Scrum masters are looking at tickets and forecasts, engineers are looking at scalability and code quality, designers are looking at UX, and nobody is looking at whether the entire ship is sailing into a pit of tar. Very few people even have the multifaceted knowledge and breadth of experience that it takes to think this way — most teams/companies don’t have a single person who possesses the requisite breadth of experience. Engineers with many years of experience are the only ones who have lived and breathed all the aspects of involuntary tech debt that I discuss in this article. But most engineers are too focused on the technical side of things to be looking out for the business.
- Mental anchoring. Our world of memetic thinking tends to have certain ideas catch on like wildfire and those ideas tend to anchor how everybody thinks about all adjacent topics. In the world of engineering delivery, the paradigm for how people think about improving delivery is Agile Software Development. This term means so many things it has basically lost all meaning, but in “Escaping the Build Trap” they do an excellent job of summarizing the problem: rituals like scrum, tools like jira, and processes like sprints essentially act like a speed magnifier (if they’re well executed, anyway). Executing these processes well means you’ve upgraded from a compact sedan to a sports car. But if you drive your car in the wrong direction, the speed doesn’t matter. Yet, because of the agile paradigm, everyone’s minds are anchored on improving delivery speed, but precious few focus on improving their ability to choose the right thing to work on.
The best way to lock yourself out of ever delivering a money-making product is to spend lots of time delivering complex, non-needle-moving features.