The Order of the Agents turns a rough scenario into a reviewed PRD, ADR, RFC, memo, or plan. Multiple agents first think independently, then challenge each other without seeing model identities, then mix the strongest ideas into a final decision packet.
agent-order prd ./scenario.md
You get:
final/report.md: the final PRD, ADR, RFC, memo, plan, or recommendationfinal/decision-log.md: what happened, rubric outcomes, blockers, cost/timeindex.html: a shareable visual run indexindex.md: a lightweight run summaryturns/*.md: every agent position, critique, revision, synthesis, and reviewtrace.jsonl: structured run events for replay, debugging, and evals
Short version: stop asking one model for decisions that matter.
Why
Single-model answers can be confident and incomplete. The Order of the Agents is built for decisions where the disagreement matters: product requirements, architecture choices, RFCs, migration plans, build-vs-buy decisions, and incident follow-ups.
The product is not another chat wrapper. It is a fair fight for good ideas: agents start with their own plans, critique anonymized peer responses, accept what improves their answer, and push back when feedback is weak. The final report is the mix of the best surviving ideas, with dissent preserved when it still matters.
Quick Start
agent-order prd ./scenario.md open agent-order-runs/<latest>/index.html
Pick the artifact you want:
agent-order adr ./decision.md agent-order rfc ./proposal.md agent-order build-vs-buy ./analytics.md
Pick more deliberation only when you need it:
agent-order prd ./scenario.md --depth quick agent-order prd ./scenario.md --depth standard agent-order prd ./scenario.md --depth deep
Install
Install from npm:
npm install -g agent-order
Then run:
agent-order "Should we build or buy an internal analytics dashboard?"For zero-install use:
npx agent-order@latest ./scenario.md
For local development from this repo:
npm install npm run build npm link
The Order assumes the agent CLIs you use, such as codex, claude, gemini, or other configured commands, are already installed and logged in.
First Demos
Run the mock demo without calling Codex or Claude:
Run a mock PRD demo using the built-in PRD template:
Run a real two-agent deliberation:
agent-order ./examples/build-vs-buy-analytics/scenario.md --agents codex,claude --out ./agent-order-runs
Good first scenarios:
agent-order ./examples/build-vs-buy-analytics/scenario.md agent-order ./examples/rest-to-trpc/scenario.md agent-order grill ./examples/review-agent-order-readme/scenario.md agent-order prd ./docs/evals/scenarios/prd/saved-search/scenario.md --depth quick agent-order adr ./docs/evals/scenarios/adr/rest-vs-trpc/scenario.md --depth quick agent-order build-vs-buy ./docs/evals/scenarios/build-vs-buy/analytics/scenario.md --depth quick
What It Writes
agent-order-runs/<timestamp>/
scenario.md
index.html
index.md
trace.jsonl
schemas/
agent-turn.schema.json
prompts/
raw/
turns/
0001-codex.initial-position.md
0002-claude.initial-position.md
0003-codex.critique.md
0004-claude.critique.md
...
final/
report.md
decision-log.md
turns/ is the audit trail. index.html is the easiest artifact to share. final/report.md is the final decision document.
Commands
agent-order <scenario text | scenario.md> [options] agent-order grill <scenario text | scenario.md> [options] agent-order <template> <scenario text | scenario.md> [options] agent-order replay <run-dir> [options] agent-order init [--config agent-order.config.yaml] agent-order check [--config agent-order.config.yaml] agent-order doctor [--config agent-order.config.yaml]
Templates:
prd | adr | rfc | build-vs-buy | migration-plan | incident-review
Common options:
agent-order ./scenario.md --agents codex,claude agent-order ./scenario.md --depth quick agent-order prd ./scenario.md --depth standard agent-order ./scenario.md --max-turns 10 agent-order ./scenario.md --human-input never agent-order ./scenario.md --out ./runs
Templates
Templates give the agents a target artifact shape and a binary final-review rubric.
Built-in templates:
prd: product requirements documentadr: architecture decision recordrfc: request for commentsbuild-vs-buy: build vs buy memomigration-plan: staged migration planincident-review: blameless incident review
Example:
agent-order prd ./docs/evals/scenarios/prd/launch-readiness/scenario.md --depth quick
Override or add templates with YAML/JSON files:
agent-order --template my-template --templates-dir ./templates ./scenario.md
Depth
Depth controls how much deliberation happens. You can omit it; the default behaves like quick.
quick: fast two-agent review, good default for normal workstandard: adds another model family and an aggregator passdeep: more agents and stronger synthesis for expensive decisionscheap: open-source-heavy roster for cost-conscious runs
Under the hood, depth presets choose a roster and synthesis strategy. Every external CLI in the preset must be installed locally.
Example:
agent-order rfc ./scenario.md --depth deep
Use doctor to see configured agents plus available presets and templates:
Deliberation Flow
The default flow is intentionally simple:
independent plans -> anonymized critique -> revisions -> synthesis -> rubric review
In more detail:
initial-position -> agents produce their own plans before seeing peers
critique -> agents critique Response A/B/C, not "Claude" or "Codex"
revision -> agents can accept good feedback or defend their position
synthesis -> strongest ideas are combined into one artifact
final-review -> the artifact is scored against the template rubric
synthesis-revision -> runs when blockers or failed rubric criteria need repair
Agent outputs include structured data:
claims: recommendations, assumptions, risks, facts, decisionsobjections: critique items with target turn/claim and severityrubric_scores: binary pass/fail review criteria with evidenceincorporated_objection_ids: which objections were accepted or addressed
Unincorporated major/blocking objections can be preserved in a ## Minority Report section.
Human Input
Human input is part of the protocol, not a side channel.
Use grill mode when the scenario needs clarification before deliberation:
agent-order grill "Should we move our frontend to a monorepo?"During a run, agents can also emit structured questions for the user. The orchestrator deduplicates them and pauses only when configured.
human_input: mode: on_blocking_questions max_questions_per_pause: 3
Disable human pauses:
agent-order ./scenario.md --human-input never
Replay
Replay reruns a previous frozen scenario with the current config, or with inherited metadata from the source run.
agent-order replay ./agent-order-runs/2026-04-26-144108 agent-order replay ./agent-order-runs/2026-04-26-144108 --depth deep
The new run links back to the source run with replay-source.md.
Eval Harness
The repo includes a small artifact eval harness under docs/evals/.
Current branch:
npm run eval -- --version current --scenario prd/saved-searchNamed baselines require an explicit executable so version labels cannot accidentally evaluate the same binary:
npm run eval -- --version v0.1 --agent-order /path/to/agent-order-v0.1.js npm run eval -- --version v0.2 --agent-order ./dist/bin/agent-order.js npm run eval -- --compare v0.1 v0.2
The judge command defaults to claude and must be available on PATH:
npm run eval -- --version current --judge-command claudeSee docs/evals/README.md for the harness layout and ship-gate notes.
Configuration
Create a starter config:
Default shape:
protocol: agent-order/v1 agents: - id: codex adapter: codex-cli command: codex preset: codex - id: claude adapter: claude-cli command: claude preset: claude limits: max_turns: 12 output: dir: ./agent-order-runs synthesis: agent: codex aggregators: null meta_synthesizer: null intake: enabled: false mode: off facilitator: codex max_questions: 6 human_input: mode: on_blocking_questions max_questions_per_pause: 3 ask_before_final: false final_review: enabled: true template: null templates_dir: null cost_warning_usd: 0
If no config sets limits.max_turns, The Order computes a turn budget from the roster, intake settings, and synthesis mode. The budget is enforced even with batched turns.
Adding Another Agent
Codex and Claude are built in. Any scriptable CLI can participate through generic-cli:
agents: - id: gemini adapter: generic-cli command: gemini args: - -p - "{{prompt}}" input: mode: arg output: mode: stdout check_args: - --version
The generic adapter can pass prompts through stdin, a prompt file, or templated args like {{prompt}}, then reads either JSON matching the agent-turn schema or plain Markdown from stdout.
Curated adapter presets exist for:
codex | claude | gemini | grok | qwen | deepseek
Resources
- docs/evals/README.md: eval harness usage
- docs/evals/scenarios/: fixed eval scenarios
- examples/: simple starter scenarios and mock config
- docs/showcase-video/: source for the showcase video
Product Positioning
The Order of the Agents is for senior developers, tech leads, staff engineers, PMs, and AI-heavy builders who already use terminal AI tools and want better review for consequential decisions:
- architecture choices
- build-vs-buy calls
- migration plans
- security and reliability reviews
- PRD/RFC critique
- incident remediation reviews
- developer-tool product strategy
Use it when the decision is worth a few minutes of critique. Do not use it for every prompt.
