Press enter or click to view image in full size
AI coding assistants have become an increasingly important part of how engineering teams work. Tools like Claude Code and Cursor are enabling individual developers to generate code faster than ever. But as we’ve adopted these tools across our teams, we’ve noticed something unexpected: while individual productivity has surged, overall project velocity hasn’t kept pace. Sometimes the gap is surprisingly large.
This disconnect pointed us toward a deeper question: where does the real bottleneck in software development now sit? As AI takes on more of the implementation work, the limiting factor shifts upstream to specification and verification, two areas where human judgment matters most.
In this post, we’ll explore why this gap exists, how the bottleneck has shifted from coding to specification, and what that means for how engineers work. We’ll also share practical approaches for teams looking to adapt, including new ways to think about code review, team structure, and the engineer’s role.
The Velocity Paradox
When we dug into why the expected gains weren’t materializing, we were reminded of something the industry has collectively forgotten: coding is only a small part of the software development lifecycle. Speed up only that part, and you get 10× faster code generation, but only 10% faster shipping. The bottleneck simply moves elsewhere.
Fred Brooks made this exact observation forty years ago in “No Silver Bullet”[2], and it’s as true now as it was then. The irony is that the industry has spent the past decade obsessing over protecting coding time: no-meeting days, quiet hours, deep-work blocks.[3] We measured productivity by uninterrupted hours in the IDE, because we assumed that’s where most value is created. We optimized our entire development process (focus hours, reduced meetings, streamlined handoffs) around the assumption that coding was the bottleneck.
Now it isn’t. And our processes haven’t caught up.
In this new model, teams will spend more time on specification and verification, and less time on implementation. That’s a profound shift in how the workday feels. Coding, for all its challenges, is largely a solitary activity. One person, one problem, deep focus. Specification and review are collaborative. They require conversation, debate, and alignment. You can’t write a good spec in isolation; you need input from product owners, other engineers, and sometimes customers.
This shift changes the intuition behind team structure. “Two-pizza teams” and Brooks’ Law were partly about reducing communication overhead so people could code.[4][5] But if the highest-value work becomes collaborative creation, communication isn’t just overhead, it’s the work. Smaller teams can win because they create shared understanding faster. Five people can genuinely align; fifteen usually can’t.
When we accelerate AI implementation to 10x speed, two new bottlenecks emerge:
- The Specification Gap. Vague tickets and incomplete requirements lead to perfectly coded but incorrect features. We’re building the wrong thing faster.
- The Review Bottleneck. Consider: Linus Torvalds, the creator of Linux, recently mentioned that he’d stopped coding a feature himself and let Google’s AI handle it entirely. Boris Cherny, creator of Claude Code, recently reported landing 259 pull requests in a single month (497 commits, 40,000 lines added, 38,000 lines removed) without opening an IDE.[6] Every line was written by AI. “Increasingly, code is no longer the bottleneck,” he noted. But if one engineer can generate that volume, who reviews it? Line-by-line code review doesn’t scale. Reviewers might keep up for a sprint, but they’ll burn outor worse, start rubber-stamping reviews.
Three Ways to Work with AI
Press enter or click to view image in full size
As teams grapple with these challenges, three distinct stances tend to show up:
- White box is the traditional model: humans implement and review line-by-line. It’s safe and thorough but doesn’t scale when AI can generate thousands of lines per hour. You can’t inspect your way to 10× productivity.
- Black box, often called “vibe coding”, sits at the other extreme: trust the AI, ship what it generates, minimal verification. It’s fast, but it’s also how you get 3 AM incidents. Fine for experiments; risky for software serving millions of users
- Grey box names the middle ground: we accept that humans won’t read everything an agent writes, but we also refuse blind trust. The leverage moves to (1) clearer intent and constraints up front, and (2) stronger verification that demonstrates the change matches that intent.
To be clear: these assistants are still imperfect. They can hallucinate APIs, make subtle logic errors, or produce code that “works” while quietly violating architectural intent. Human supervision remains essential, especially from engineers with strong fundamentals who can detect when the agent has drifted. At the same time, they’re already faster than doing everything by hand, and the set of cases that require intervention is shrinking quickly. The trajectory matters as much as the current state.
The name “Grey Box” captures the philosophy: we’re not treating AI as a transparent white box we fully understand, nor as an opaque black box we blindly trust. We’re treating it as a capable but imperfect collaborator that needs clear instructions and produces work that needs validation.
One thing must be crystal clear: the engineer guiding the agent and the reviewer approving the merge request remain fully accountable for what ships. There is no “the AI did it” defense. If a bug ships, a human approved it. If the architecture is wrong, a human missed it.
The Real Shift: From Implementers to Solution Architects
Here’s the uncomfortable truth: if AI codes 10× faster than us, the highest-leverage activity is no longer typing code. It’s creating specifications precise enough that an AI agent executes them correctly the first time.
Our primary role is shifting from Implementers to System Architects.
This isn’t a downgrade. If anything, it’s an expansion of scope. Engineers now need to understand business context more deeply, think about system architecture more carefully, and communicate requirements more precisely. They’re not becoming less technical; they’re becoming more strategic.
One engineer put it well during a recent discussion: “We’re not writing assembly language anymore either, and no one thinks that makes us worse programmers.” Every generation of tooling raises the abstraction level. AI coding assistants are just the latest step.
But it does require learning new skills. Most of us have spent years developing expertise in evaluating code. We can spot a problematic function, an inefficient algorithm, a security vulnerability. How many of us have the same expertise in evaluating specifications? In writing prompts that produce correct code on the first try.
That’s the skill gap we need to close.
Two practical implications follow from this shift:
Implication One: Specification Becomes the Main Interface
A high-fidelity spec is the new source code.
What does that mean in practice? A good specification for AI-driven development needs three components:
- Testable acceptance criteria: Not vague descriptions, but clearly defined outcomes from a user’s perspective. If you can’t write a test for it, you haven’t specified it.Explicit corner cases: Edge cases normally discovered during implementation must be identified upfront. What happens when the input is null? When is the list empty? AI will make assumptions; make them explicit instead. Captured technical decisions: The “how” and “why” of implementation: architectural choices, integration points, existing codebase patterns that AI doesn’t know.
Writing comprehensive specifications sounds tedious, and historically, it has been. But AI can help with the specification process itself, not just coding. Some approaches teams are experimenting with:
- AI-Assisted Distillation. Every feature starts with conversations: meetings, Slack threads, whiteboard sessions. AI can process those transcripts and discussions into structured first drafts, capturing information that would otherwise be lost.
- AI as a Sparring Partner. Before any code is written, prompt AI with your initial spec and ask it to find gaps. What questions would a new developer ask? What assumptions are you making? Let the AI challenge your plan before you start implementation.
- Rapid Prototyping. Build quick, throwaway demos with AI to validate ideas. A working prototype in an hour of conversation with a Product Owner can surface misunderstandings that would otherwise take a sprint to discover.
- Pair Specification. We pair program; why not pair specify? Two engineers reviewing a plan together will catch corner cases and architectural issues that one would miss.
- The specific techniques matter less than the underlying shift: treat specification as the high-value work, not as overhead before the “real” work begins.
Implication Two: Evidence Matters More Than Inspection
Press enter or click to view image in full size
If we’re not reviewing every line of code, how do we trust what AI produces?
The answer requires a conceptual shift in what we’re reviewing. The object of review moves from code to evidence. Instead of asking “is this code correct?”, we ask “do the tests and checks prove this implementation meets the specification?”
This is a fundamental change. Code is the artifact; evidence is what matters. A thousand lines of elegant code mean nothing if the tests don’t verify the acceptance criteria. Conversely, if a comprehensive test suite passes and the architectural checks are green, the specific implementation details become less critical; the proof is in the verification.
How you structure that verification will depend on your stack and context, but the general shape involves layered automated checks with human expertise focused where it adds the most value.
At the inner layers, automated tools handle what they’re good at: linters catch stylistic issues, static analysis flags common errors, and AI reviewers provide feedback on best practices and complexity. These catch universal mistakes that don’t require business context.
At the outer layers, humans focus on what automation can’t: verifying that the test suite covers the specification (test-to-spec fidelity), and ensuring the implementation adheres to architectural principles. Does this solution scale? Is it maintainable? Does it fit our system’s patterns?
This verification is only as strong as the architecture allows it to be. If AI-generated code is forced to mock every dependency, the resulting tests may pass while proving very little about real behavior. Engineers must architect solutions with clear, testable integration points, the seams where services meet, and data crosses layers. Techniques like contract testing and well-defined API boundaries become more important, not less, when AI is writing the implementation.
Human reviewers shift from inspectors to architects. We stopped asking “Is this line correct?” and start asking “does the evidence prove this solution sound?” Merge approval shifts in meaning: you’re confident the evidence supports the spec, and the change fits the architecture and not that you personally read and understood every line.
Applying the Lens (without changing your whole process)
Press enter or click to view image in full size
If you’re seeing these bottlenecks, here are a few low-risk ways to respond.
- Start with awareness, not process changes. The first step is simply recognizing that time spent clarifying specifications is now more valuable than time spent optimizing coding efficiency. That mindset shift alone changes how you approach each story.
- Try it on one feature first. Pick a single story/OKR and deliberately invest more in upfront clarity: tighten acceptance criteria, call out boundaries, and use AI to help refine the spec before writing production code.
- Develop evaluation skills. Just as we learned to code review effectively, we need to learn specification review effectively. What makes a good prompt? A complete acceptance criterion? An adequate corner case mapping? These skills can be taught and practiced.
- Invest in test-to-spec fidelity. When reviewing code, shift your attention from “is this correct?” to “do the tests prove this meets the specification?” That’s a different question and answering it well is a skill.
- Design for testable integration points. The boundaries between your services and layers become your critical verification surface. Ensure your architecture supports testing real interactions, not just mocked-out units.
- Expect iteration. The specific practices that work will vary by team, codebase, and domain. The AI tools themselves are evolving rapidly; what work today might be outdated in six months. Build in time to learn and adapt.
Be patient. This is a paradigm shift in how engineers work. The timeline isn’t one quarter; it might be one to two years before these new muscles are fully developed. That’s okay.
The bottleneck hasn’t disappeared; it never does. It’s just moved. And it’s moved to a place where human judgment and expertise matter most: deciding what to build, designing how it fits together, and verifying that it works as intended.
That’s not a diminishment of the engineering role. That’s what engineering was always supposed to be. The question isn’t whether this shift will happen; it’s whether your team shapes it deliberately or scrambles to catch up.
The ideas in this post emerged from internal discussions at Agoda about how AI reshapes engineering work. We are sharing this framing because it’s been a useful way to talk about trade-offs, especially as implementation gets cheaper.
References
- [1] Torvalds, Linus. “AudioNoise README.” GitHub, January 2026. github.com/torvalds/AudioNoise
- [2] Brooks, Frederick P. “No Silver Bullet: Essence and Accident in Software Engineering.” Proceedings of the IFIP Tenth World Computing Conference, 1986.
- [3] DevDynamics. “Focus Time and Its Impact on Developer Productivity.” October 2024. devdynamics.ai
- [4] Amazon Web Services. “Amazon’s Two Pizza Teams.” AWS Executive Insights, January 2025. aws.amazon.com
- [5] Brooks, Frederick P. The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, 1975.
- [6] Cherny, Boris. Post on X (formerly Twitter), December 2025. x.com/bcherny
Press enter or click to view image in full size