Executable Specifications: Working Effectively with Coding Agents

Coding agents can generate code quickly, but the results often miss the mark. Natural language is convenient, yet often too ambiguous for precise requirements. Traditional unit tests are exact, but large test files with many assertions can become hard to read and maintain.

A more reliable workflow borrows ideas from BDD, specification by example, and snapshot testing. The key question is not how a task is implemented, but how the system behaves. One practical way to capture that behavior, while keeping it easy to review and verify, is through executable specifications.

What Is the Executable Specification Pattern?

This approach is related to tools like Cucumber and Gherkin, as well as snapshot-style testing. Instead of writing custom test code for every scenario, you build a general-purpose test runner and move most of the validation into specification files.

An executable specification acts as a contract. It describes:

Inputs such as arguments, source files, and system state
Expected results such as stdout, stderr, output files, exit codes, and optionally call sequences

The runner reads the specification, sets up the described environment, executes the program, and compares the actual results with the expected behavior defined in the file.

Example: A CLI Utility That Prints Codebase Structure

While building the outln utility from my previous article, I created a YAML-based specification format.

JSON or TOML would also work. I considered TOML, but YAML felt more readable for this use case. Choosing a familiar format has another benefit: agents are more likely to produce valid files, and you can lint or validate the specifications to catch syntax problems early.

The utility walks through all source files in a given folder and prints each file path together with its header comment. To keep the output deterministic, it should process files in a stable order, for example by sorting paths.

Example specification:

args: # input parameters for the CLI
  - src
src/one.ts: | # file 1
  /** Summary for one. */
  export const one = 1;
src/two.ts: | # file 2
  /** Summary for two. */
  export const two = 2;
stdout: | # expected result
  src/one.ts: Summary for one.
  src/two.ts: Summary for two.

Error scenario example:

args:
  - foobar
stderr: |
  Error: Directory foobar does not exist
exit_code: 1

These YAML files are the executable specifications. The runner reads args, materializes the folder structure described in the file, runs the CLI, and compares the actual stdout, stderr, and exit code against the expected result.

This has a few practical advantages:

Everything is in one place: arguments, input files, and expected output
Each spec is quick to read and easy to understand
The format is easy to extend, for example with env variables or additional expectations
An AI agent can usually copy the format and generate new specifications with little guidance

At the time of writing, the outln project has 127 such specifications, most of which were generated by an agent.

Why This Works Well with Coding Agents

The main value of an executable specification is that it creates a shared contract.

Natural language is flexible, but it leaves room for reinterpretation and hallucination. Fully custom test code is precise, but often slower for humans to write and review. Executable specifications sit in the middle. They are structured enough to reduce ambiguity, while still being readable enough for fast human review.

They also offer a few practical benefits:

Stable contract: Given specific input, the expected behavior is explicit
Documentation by example: Agents can generate new specs by following an existing pattern
Scalable review: Reviewing a set of short specifications is often faster than reasoning through an equivalent pile of test code

For an agent, this is especially useful because the gap between current behavior and expected behavior is much narrower than the gap between a prose requirement and a correct implementation.

That said, executable specifications are not a full replacement for lower-level tests. They are most useful when you want to verify externally visible behavior across many scenarios.

How to Build a Workflow with an AI Agent

Initialize the system Define a specification format and implement a basic runner. For many CLI tools, the runner is straightforward enough to delegate to an agent.
Assign tasks through specs You can write the specs yourself and reduce review effort later, or you can ask the agent to add specifications for a feature. In that case, your main job becomes reviewing the generated specs.
Iterate against the contract The agent runs the new specs, sees the difference between current behavior and expected behavior, and gets a clear target for implementation.

Conclusion

Executable specifications turn “I hope this works” roulette into a more controlled workflow. Instead of giving an agent abstract requirements, you provide concrete examples of behavior that humans and machines can both read, execute, and verify.

That makes collaboration less fuzzy, review faster, and implementation targets much clearer.

Links:

An example of the runner used for the outln tests can be found here: https://github.com/ptol/casefile-runner
Examples of the specifications are here: https://github.com/ptol/outln/tree/main/tests/cases