It’s the System, Stupid!

Since this “Age of A.I.” arrived in late 2022, something’s been nagging at me. As more and more data rolls in, we see an apparent paradox emerging where “A.I.” coding assistants are concerned.

Individual developers report productivity gains using these tools (though many also report significant frustrations with, for example, “hallucinations”).

And at the same time, data clearly shows that the more teams use them, the bigger the negative impact on team outcomes like delivery throughput and release stability.

How can both these things be true?

We have one very plausible candidate for a causal mechanism, and it’s an age-old story in our industry.

When programmers get a feeling that they’re getting things done faster, they’re often only considering the part where they write the code – particularly when that’s their part of the process.

What they’re not considering is the whole software development process, and especially downstream activities like testing, code review, merging, deployment and operations.

More code faster can mean bigger change sets – more to test (and more bugs to fix), more code to review (and more refactorings to get it through review), more changes to merge (and more conflicts to resolve), and so on.

“A.I.” code generation’s a local optimisation that can come at the expense of the development system as a whole, especially if that system is more batch-oriented, with design, coding, testing, review, merging and release operating like sequential phases in the delivery of a new feature. In such a system, more code faster means bigger bottlenecks later. So there’s no paradox at all: one causes the other.

When teams work in much smaller cycles – make one change, test it, review the code, refactor, commit that and maybe push it to the trunk – they may experience far fewer downstream bottlenecks, with or without “A.I.” coding assistance. Arguably, coding assistants might make little noticeable difference in such a workflow.

The DORA data strongly indicates that the teams with the shortest lead times and the highest release stability tend to work this way, with continuous testing, code review and merging as the code’s being written.

And all this got me to thinking, maybe we’re targeting machine learning and “A.I.” at the wrong problem. Instead of focusing on individual developer productivity with things like code generation, perhaps this technology would yield more fruit if it was focused on systemic issues and reducing bottlenecks.

Maybe, for example, instead of using ML models to generate code, could they be more productively applied to reviewing code? Could a “smart” linter reduce the need for after-the-fact code review?

Of course, many of us already enjoy the benefits of “smart” linters. We call it “pair programming” or “ensemble programming”. And, having used static code analysis tools that incorporated statistical models or neural networks, the results weren’t all that impressive. Hard to see such a tool significantly out-performing a classic linter + a second pair of experienced eyes (if such eyes are available to you, of course, and maybe that’s the use case).

Perhaps the real value might be found in widening our view. What if a model (or models) could be trained on data collected across the entire cycle, from product strategy through to operational telemetry, support and beyond?

Imagine a model that, given, say, a Figma UI wireframe, could predict how many support calls you’d be likely to get about it. Imagine a model that, given a source file, could predict its mean time to failure in production?

More generally, imagine a model that could, with reasonable accuracy, predict the downstream impact of upstream activities, so as SuperDuperAgenticAI spits out its slop, alarm bells start to go off about where this is likely to lead if it gets any further.

A pipe dream, you might think. But in actual fact, such predictive technologies exist in other disciplines like electronic engineering, where statistical and ML models are used to predict the reliability and probable lifetimes of printed circuit boards, for example.

There would be some major hurdles to overcome to apply similar techniques to software development, though, not least of which is the jungle of higgledy-piggledy data formats our many proprietary tools and platforms produce. Electronics has established data interchange standards. We, for the most part, do not – probably because that would require enough of us to agree on some stuff, and that isn’t really our strong suit.

But, if these challenges could be overcome, or worked around (e.g., with a translation/encoding layer), I’m pretty sure there are patterns hidden in our complex and multi-dimensional workflow data that maybe nobody’s spotted yet. I mean, we’ve barely scratched the surface in the last 70+ years.

In a very handwavy sense, though, I feel quite sure now that “A.I.” is being targeted at the wrong problem in software: with an exclusive focus on individual developer productivity, when the focus should be on the system as a whole.

In the meantime, we’re pretty sure at this point that things like continuous design, continuous testing, continuous code review and continuous integration do have a positive systemic impact, so focusing on that is probably the most productive I can be for the foreseeable future.

If your team would like training and mentoring in the technical practices that we know speed up delivery cycles, shorten lead times and improve product and system reliability, with or without “A.I.”, pay us a visit.