Agentic Specification Engineering: Agile Was Never Supposed to Mean Making It Up as You Go

Press enter or click to view image in full size

For years, the software industry has been telling itself a comforting story about agile. Do not spend too much time up front. Do not over-specify. Do not try to model everything in advance. Start moving, stay adaptive, and let the requirements emerge through the work.

That story contained an important correction. It pushed back against bureaucratic waste, against fantasy planning, against the old habit of treating early documentation as if it were reality itself.

But like most corrective movements, it was eventually taken too far. In many organizations, agile stopped meaning iterative delivery against an evolving but governed model of the system. It started meaning something much looser and much more dangerous: ad hoc requirements formation during sprints, architecture inferred from backlog items, support posture buried in tribal knowledge, and product definition reconstructed one meeting at a time.

That model was already weaker than people wanted to acknowledge. Now agents are entering the delivery system, and the weakness is no longer easy to hide.

The hidden subsidy behind modern agile

A great deal of what passes for agile execution today is not actually lightweight discipline. It is human compensation.

People are doing the missing work in their heads.

They remember which document is actually authoritative even when the repository says otherwise. They know which acceptance criteria were real and which were placeholders. They know which roadmap statements were aspirational, which architectural assumptions are stable, which design notes were superseded, and which ticket comments are not meant to be treated as durable truth.

In other words, the system functions because humans are continuously reconstructing coherence from incomplete artifacts. That reconstruction is expensive, but it is invisible, so organizations pretend it is free.

This is one of the reasons bad documentation cultures survive longer than they should. Human teams can absorb an astonishing amount of structural weakness without immediately collapsing. Meetings, memory, social alignment, informal authority, and institutional context cover for a broken source model.

Agents on the other hand do not do that. They do not inherit context socially. They inherit it structurally. This basic premise changes everything.

The problem is not that agents need more documentation

This is where a lot of teams misunderstand the moment they are in. They assume the answer is more prose. More design notes. More explanatory text. More backlog detail. More status commentary. More context everywhere. However, that is not the real problem.

The real problem is that most documentation systems were built for humans browsing linearly, not for bounded machine reasoning under cost constraints. A human can read a messy corpus and still make a reasonable judgment. A human can notice that one paragraph is aspirational, another is normative, and a third is just historical residue. A human can tolerate ambiguity because a human can repair it.

An agent is operating under a different set of conditions. Every retrieval step has cost. Every unresolved ambiguity expands the search space. Every mixed-role document increases interpretive burden. Every unstated authority boundary forces inference. That means what used to be dismissed as ordinary documentation messiness becomes a serious execution defect.

Requirements mixed with planning. Support claims buried in narrative prose. Current status implied rather than declared. Legacy root documents that still sound authoritative after the system has evolved beyond them. Validation that proves files exist but does not prove the corpus can drive a real implementation without guesswork. None of that is harmless in an agentic environment. It is not a documentation issue anymore. It is a runtime issue.

Agile was never supposed to mean the absence of a baseline

This is the part people need to get straight. What is being argued here is not a rejection of agile. It is not a return to waterfall. It is not a plea for giant upfront specification documents that attempt to freeze the future. It is something much simpler. Agentic development cannot function reliably without a governed baseline context. That baseline does not eliminate iteration. It makes iteration possible.

This is the point many organizations still miss. They behave as if incrementality itself solves the problem of context. It does not. Incrementality only works when the increments are being applied against some durable source model of what the system is, what governs it, what is currently true, what is intended, what is supported, and how conflicts are resolved. Without that, the team is not iterating against a coherent system definition. It is improvising against a moving fog. Humans can sometimes get away with that. Agents cannot do it well, and they definitely cannot do it cheaply.

So this model is not anti-agile in principle. It is anti-ambiguity. It is the minimum baseline effort required for agile to remain meaningful once agents are expected to participate in delivery.

The backlog is not the product definition

One of the most corrosive habits in modern software development is the tendency to let the sprint backlog become the de facto model of the system. This is a category error. Backlogs are sequencing tools. They exist to organize work, not to define truth. They are tactical by nature. They should move. They should be reprioritized. They should reflect changing information.

What they should not do is carry the full conceptual load of the system. But that is exactly what happens in weak agile environments. A story starts implying architecture. An acceptance criterion starts implying support posture. An implementation note starts implying a constitutional boundary. A planning comment starts implying a commitment. A ticket starts carrying more semantic authority than the governing specification because it is newer, louder, and closer to the current sprint.

Once that starts happening, the system loses any real separation between normative truth, current state, future intention, and implementation sequence. That is survivable in a human-only environment because people compensate for it. It is a bad source language for agentic development.

Agents are not chatbots. They are non-deterministic compilers

The easiest way to understand the shift is to stop thinking of agents as glorified assistants and start thinking of them as non-deterministic compilers.

A traditional compiler takes a constrained source language and produces an executable result. If the source is malformed, ambiguous, or structurally weak, the compiler fails. An agent works differently, but the analogy still holds. It consumes a source corpus and produces one of several possible outputs: implementation code, a migration plan, a design proposal, a review, a remediation strategy, a test harness, a support assessment.

The difference is that the agent can often produce something plausible even when the source material is poor. That is precisely the danger. A weak specification corpus does not always cause the process to stop. It causes the system to guess. And once the system is guessing, both quality and cost get worse at the same time.

It reads more. It compares more. It infers more. It corrects itself more often. It sounds confident in places where the source material only justified uncertainty. That is why the right question is not whether the documentation is comprehensive. The right question is whether the source corpus is structured tightly enough to constrain interpretation. Completeness matters and controlled interpretability matters more.

Requirements are now part of execution

This is the mental shift teams have to make. Requirements are no longer just documents about the system. They are part of the system that produces work.

The requirement set, the authority order, the support posture, the capability boundaries, the routing surfaces, the validation logic, and the adoption gates all influence what an agent reads, what it trusts, what it infers, and what it produces.

That means the specification corpus is no longer passive explanation attached to the codebase. It is part of the operational substrate. Once that becomes clear, several other conclusions follow. Requirements need versioning, structure, decomposition, validation, review, and regression control. Support claims should not live only in narrative prose.

Authority boundaries should not be left for the reader to infer. Validation should not stop at syntax and existence. And the corpus itself has to be treated as an executable surface, not a documentation afterthought. This is why the phrase specification engineering matters. The work is no longer clerical. It is architectural.

The baseline is not a giant document. It is a governed context system

Some people hear “baseline” and immediately imagine a giant monolithic requirements artifact written before any real work begins. That is not the model. The baseline should be compact where possible and complete where necessary.

This is not about writing more words. It is about establishing enough authoritative structure that the system has a stable frame of reference. At minimum, that means a clear foundation layer, governing specifications, present-state references, capability mappings, routing rules, and validation logic that tests whether the corpus can answer the kinds of questions agents actually need to ask.

This is not bureaucracy. It is source integrity. A good baseline is not large by definition. A good baseline is legible, authoritative, and operationally useful. A bad baseline can be tiny and still be useless if it leaves key authority and support questions unresolved. The real issue is not document length. It is whether the source model is strong enough to support bounded reasoning without repeated reinvention.

What strong agentic systems do differently

The strongest agentic systems usually have something most organizations lack: a clear separation of roles inside the documentation corpus.

Foundation documents define what the system fundamentally is and is not. This is where architectural invariants, security assumptions, ownership rules, and non-negotiable constraints belong.

Specification documents define intended behavior. This is where normative requirements, profile declarations, exclusions, support posture, and advancement criteria belong.

Reference documents define present truth. They answer the uncomfortable but necessary questions: what is supported now, what evidence exists, what remains deferred, what is implemented but not yet supportable.

Implementation documents define sequencing, rollout, migration guidance, readiness work, and operational procedure.

Analysis documents synthesize, compare, explain, and generalize.

Most organizations mix these roles constantly, then wonder why both humans and agents keep reading too much and trusting the wrong things. Role separation is not cleanup. It is retrieval discipline. When role is clear, authority becomes clear. When authority becomes clear, the system spends less time guessing. When the system spends less time guessing, both token expenditure and execution error decline.

Token optimization is not primarily about shorter documents

This is another place where people routinely aim at the wrong target. They think cost control in agentic systems is mainly about compression. Shorter prompts. Shorter documents. Tighter summaries. That is a shallow view.

The biggest gains do not usually come from cutting words. They come from reducing unnecessary retrieval and interpretation.

The highest-leverage patterns are structural. A compact bootstrap surface that defines read order and conflict resolution. A dependency map that resolves stable identifiers to canonical paths. A capability registry that maps capability families to governing specifications, current-state references, and adoption gates. Structured claim metadata for support-bearing surfaces. Explicit authority layering. Clear directory hygiene by role.

These patterns reduce cost because the system can route before it reasons deeply. It can look up instead of infer. It can resolve boundaries through structure rather than through prose mining. That is where token efficiency really comes from.

Token optimization is mostly an information architecture problem masquerading as a prompt problem.

The trade-off people still fail to see

There is a real trade-off here, but it is not the one most teams think it is. The trade-off is not between agile and structure. The trade-off is between spending effort up front to establish a governed source model, or spending that effort repeatedly and expensively during execution through ambiguity, rework, correction, over-retrieval, and hidden human coordination.

That work does not disappear. It moves. In weak systems, it moves into meetings, memory, ticket archaeology, and corrective prompt cycles. In stronger systems, it moves into the baseline context architecture. This is the same trade that good engineering makes everywhere else. Invest in structure once so the system stops paying for disorder repeatedly.

The difference now is that agentic systems make the cost of disorder much more measurable.

The anti-pattern agile accidentally normalized

A lot of what the industry now calls agile maturity is actually just structured improvisation. That sounds harsh, but it is often true. The team appears fast because people are compensating for the absence of a durable source model. They carry the architecture in their heads. They remember the support boundaries. They know which documents lie. They know which language is normative and which language is decorative.

The system looks lighter than it is because the missing structure is being manually emulated by experienced humans. Once agents enter the loop, that hidden subsidy disappears. Now the weaknesses become visible. Narrative support claims become expensive and inconsistent. Mixed-role documents become operational hazards. Tool-specific truth forks become sources of silent drift. Backlog-centric thinking starts collapsing into semantic confusion. Validation limited to links and syntax starts looking almost absurdly inadequate. These were always anti-patterns.

They were simply being masked by human elasticity.

The right operating model

The right model is straightforward. First define the governed source surface. Then let incremental development proceed against it. Then require every increment to update that governed context as part of the work. Not later. Not when someone has time. Not after the implementation is already diverging from the specification corpus. As part of the work itself.

That means stories are no longer just implementation asks. They are context maintenance events. If support posture changes, the support-bearing artifacts must change. If a capability boundary changes, the capability registry must change. If an architectural invariant is introduced or revised, the foundation layer must change. If adoption criteria evolve, the governing spec must evolve with them.

That is what real incrementality looks like in an agentic delivery model. Not the absence of structure. The disciplined evolution of structure.

The strategic implication

The old model assumed documentation was downstream of delivery. That assumption no longer holds.

If agents are part of the delivery system, then the specification corpus is part of the runtime. And if it is part of the runtime, it must be engineered with the same seriousness as code, architecture, and operations. That is the shift. Not anti-agile. Not anti-incremental. Not a nostalgic return to heavyweight upfront design.

A recognition that agentic development still requires a source language, and that building the baseline context for that source language is not a divergence from agile. It is the precondition for agile to work at all.

Further reading on: Agentic Software Development Patterns.

Implementation Ideas for: Governed Agentic Workflow

Real world examples of Agentic Specification Engineering: IDP