Verification Debt Is Your Next Headache

5 min read Original article ↗

The bottleneck didn’t disappear when AI started writing code. It moved.

I used to deal with complaints about slow delivery. Tickets sat too long in development. Engineers were stuck, blocked, waiting on review. The work was the bottleneck. Fix the work, fix the speed.

Now I sit in planning sessions where we talk about how many agents we’re spinning up per engineer. Our output has visibly accelerated. PRs are flying in. Tickets close faster. And yet I have a quiet, growing feeling that something is off.

It’s not that the code is bad, exactly. It’s that I’m not sure anymore.

The Trap Inside the Green Build

There’s a term for what’s accumulating: verification debt. Lars Janssen put it well in a recent post: it’s the gap between how fast you can generate output and how fast you can validate it. Every time someone approves a diff they haven’t fully understood, they’re borrowing against the future.

What makes this sneaky is that it doesn’t feel like debt. Technical debt announces itself. Slow builds, tangled modules, that one service nobody wants to touch. You feel it. Verification debt is quiet. The tests are green. The PR looks clean. The commit message is actually better than what your average human writes. It looks like progress.

And six months later you discover you built exactly what the spec said, and nothing the user actually needed.

I’ve been there without AI. We’ve all shipped something “correct” that was wrong. AI just made it faster and easier to get there at scale.

Everyone’s 50% More Productive, And That’s The Problem

Here’s the math nobody wants to talk about. If AI makes every engineer 50% more productive, you don’t get 50% more output. You get 50% more pull requests. 50% more documentation. 50% more design proposals. And someone, somewhere, still has to review all of it.

When two or three early adopters start generating more PRs than before, the team absorbs it. No big deal. When everyone does it, review becomes the constraint. The bottleneck doesn’t vanish. It moves upstream, to the parts of the job that are irreducibly human: deciding what to build, defining “done,” understanding the domain, making judgment calls about risk.

I’ve written about this pattern before: the work didn’t disappear, it moved. What’s new here is that it moved specifically into verification - and most teams haven’t consciously staffed or structured for that yet.

As an engineering leader, this is the thing I keep turning over. I’ve been thinking about team capacity in terms of writing capacity for years. Story points, cycle time, sprint velocity. All of those metrics were proxies for human time spent building. They’re not good proxies for a world where building is cheap.

The question isn’t “how do we produce more code?” anymore. The question is “how do we verify more code?” And I don’t think most teams have a real answer to that yet.

What I’m Actually Watching For

I’ve started framing verification as a first-class engineering skill, not a phase in the process.

The checklist matters. Before anything ships: Is the agent implementing the right logic, or faithfully coding a flawed spec? What assumptions did it make about the domain? What permissions or side effects did this introduce? Would you stake your name on this doing what the user actually needs, not just what the ticket says?

That last question is the one. “Probably” is not good enough.

(I’ve started asking this in code review discussions and the silence is instructive. Nobody wants to be the one to say “I pressed approve and moved on.”)

The agents won’t question your intentions unless you explicitly ask them to. That asymmetry is the thing most people are still underestimating. The model is remarkably good at implementing what you described. It is not trying to tell you that you described the wrong thing.

The Skill Hasn’t Shrunk, It’s Shifted

People keep saying AI is making engineers dumber. I don’t think that’s the right frame.

We stopped memorizing API signatures when search engines appeared. The cognitive load didn’t disappear. It shifted toward synthesis, architecture, judgment. The skill changed, but it didn’t shrink.

What I see with AI is the same pattern, one level up. The skill isn’t “write code.” It’s “understand what the code should do and verify that it does it.” That requires knowing the domain. Reading the diff and actually following the logic. Catching the places where the agent made a confident wrong assumption.

That’s harder, not easier. The volume of output to understand has gone up. The quality bar for review has gone up. The penalty for rubber-stamping has gone up.

Most of my day is asking agents questions and then validating their answers. It sounds efficient. In practice it’s a different kind of exhausting.

What This Means for How You Run Your Team

If you’re managing engineers right now, I’d argue the most valuable thing you can do is resist the instinct to measure AI impact through output metrics. PRs per engineer, velocity, closed tickets. Those numbers will go up. They should go up. They’re not the thing to watch.

Watch the things that tell you whether the output is right. Review latency. Defect escape rate. Incidents that trace back to “we shipped something that looked fine.” Those will tell you whether your team is accumulating verification debt or actually managing it.

Build review capacity explicitly. Don’t assume it scales with generation capacity. It doesn’t.

And create space for engineers to say “I don’t fully understand what this does.” That admission is worth a lot more than a fast approval.


The agents make output cheap. They don’t make responsibility cheap.

That’s the job now: keeping those two things from drifting too far apart.