Wheat — Structured Research Sprints for Claude Code

You're about to make a big
technical decision.
Most teams wing it.

ChatGPT gives you a paragraph. wheat gives you typed claims with conflicts highlighted. You research, prototype, stress-test — then a compiler catches contradictions and blocks output until they are resolved. A build system for decisions.

Works with Claude Code, Cursor, Copilot, or standalone. Node.js 20+.

npx wheat init "Should we migrate from microservices to a modular monolith?"

Runs inside Claude Code. Node.js 20+. Zero npm dependencies.

What is wheat?

Every finding in a wheat sprint is a claim — a single typed statement with an evidence grade. "Redis handles 100k ops/sec" is a factual claim at the "documented" tier. "We should use Redis" is a recommendation at the "stated" tier. The compiler treats them differently.

wheat is a structured way to answer technical questions. You start with a question — "Should we migrate from microservices back to a modular monolith?" — and use slash commands to research, prototype, and challenge your findings. Every finding becomes a typed claim (factual, risk, estimate, constraint, recommendation) with an evidence tier from "someone said it" to "measured in production."

When you're ready, a compiler validates everything: catches contradictions, flags weak evidence, and blocks output until issues are resolved. The compiler is JavaScript code — not prompts, not an LLM call. Same claims in, same result out, every time. The output is a decision brief you can send to your team, with a git audit trail showing how every claim was collected and challenged.

Without Wheat

"We should probably roll our microservices back into a monolith because Prime Video did it and Jake said it worked well at his last company."

With Wheat

14 typed claims. 3 risks flagged. 1 contradiction caught between a team-size assumption and a documented constraint. Recommendation: consolidate the auth service back into the monolith; keep payments split where async boundaries are load-bearing. Every claim traceable to a source.

Quick start — 1 command

Ask your question

One command. Zero prompts. Sprint ready in under 3 seconds.

$ npx @grainulation/wheat "Should we migrate from microservices to a modular monolith?"

wheat — sprint created Should we migrate from microservices to a modular monolith? 1 constraint(s) seeded

Gather evidence

Use slash commands to research, prototype, and challenge. Every finding is tracked with a type and an evidence grade.

wheat> /research "service consolidation cost"

r001 [factual|documented] Network calls (~2ms p50) become in-process calls (~1µs) once consolidated — p99 drops 40-60% r002 [factual|documented] Cross-service sagas collapse into single-module ACID transactions — class of bugs eliminated r003 [estimate|web] Consolidation: ~3 weeks per service

Ship the brief

The compiler validates everything, resolves conflicts, and produces a decision document you can share.

Compiled 14 claims, 3 conflicts resolved Written: output/brief.html (self-contained) Recommendation: Consolidate the auth service into the monolith; keep payments split. Revisit after 6 months of production data.

Real example

"Should we consolidate the auth service back into our monolith?"

A team needs to decide whether to invest 3-5 weeks in consolidating their standalone auth service back into the monolith. Instead of debating in Slack, they use wheat to research it properly.

$ npx @grainulation/wheat init

Define the question

The team lead types the question: "Should we consolidate the auth service back into our monolith?" wheat asks for audience (platform team), constraints (must not break existing SSO clients during the rollback), and scaffolds the investigation.

wheat> /research "auth service consolidation trade-offs"

Gather evidence

wheat reads your codebase, searches the web, and records what it finds. Each finding gets a type (factual, risk, estimate) and an evidence grade — from "stated" (someone said it) to "tested" (prototype-validated).

r001 [factual|documented] Network calls to auth become in-process module calls — p99 login-check latency drops 40-60% (microservices-migration literature) r002 [factual|documented] Cross-service saga/outbox work collapses into a single ACID transaction — eliminates a class of partial-failure bugs r003 [risk|documented] Deployment-unit coupling returns — auth changes now ship on the monolith cadence; build-pipeline time grows

wheat> /prototype

Build and measure

wheat builds a working proof-of-concept and benchmarks it. Findings from prototypes get the "tested" evidence grade — real measurements, not blog posts.

p001 [estimate|tested] Reverse-strangler prototype: auth consolidated behind a module boundary in ~3 weeks; saga removal saved 2 more p002 [estimate|web] Login p95 drops ~42% vs the cross-service baseline; auth-module tests run in-process (no test containers needed)

wheat> /challenge r001

Stress-test the findings

The adversarial review catches that the latency-win estimate assumes auth's in-process path will stay fast at monolith scale — but the last 90 days of incidents show the monolith already has slow-path contention, and the reverse-strangler re-establishes the schema boundaries that made auth painful to host in the first place.

x001 [risk|documented] 40-60% latency win assumes monolith slow-path stays fast at scale; 90 days of incidents show contention, and reverse-strangler re-couples schema boundaries — not free

wheat> /brief

Ship the decision

The compiler resolves conflicts (higher evidence grades win), validates everything, and produces a self-contained HTML decision document. If there are contradictions, it tells you exactly what to fix first.

Recommendation: Consolidate the auth service back into the monolith; keep payments split where async boundaries are load-bearing. Use the reverse- strangler pattern behind an anti-corruption layer during cutover. Latency gains are real but narrower than initially estimated. The module-boundary enforcement work (r003) needs to land before the cutover. 14 claims | 3 conflicts resolved | evidence: 2 tested, 8 documented, 4 web -> output/brief.html (self-contained, send to stakeholders)

The outcome: Instead of a 45-minute meeting where the loudest voice wins, the team has a compiled brief with 14 validated findings, resolved conflicts, and a clear recommendation. Anyone can run git log claims.json to see how the decision was made.

Numbers in this example are illustrative — the pattern is what matters. Evidence tiers (web, tested, production) flag which claims would need what level of verification in your own sprint.

Key features

A decision-making framework built for engineers

Everything you need to go from question to architecture decision. Nothing you don't.

Typed, Evidence-Graded Claims

Every finding has a type (factual, risk, estimate, constraint, recommendation) and an evidence grade from "stated" (unverified) to "production" (measured in prod). Weak evidence gets flagged. silo stores proven claims as reusable packs.

Compiler Validation

A 7-pass compiler runs before any output is produced. It validates schemas, checks type distribution, sorts by evidence tier, detects conflicts, auto-resolves when evidence tiers differ, analyzes coverage, and checks readiness. Pure JavaScript — no LLM calls. You cannot ship a brief built on contradictions.

20 Slash Commands

/init, /research, /prototype, /challenge, /witness, /blind-spot, /brief, /present, /status, /feedback, /handoff, /merge, /replay, /calibrate, /resolve, /evaluate, /connect, /next, /sync, /pull.

Git Audit Trail

Every finding auto-commits. git log --oneline claims.json is the complete history of how you reached your decision. farmer streams tool calls to your phone for real-time approval.

Shareable Decision Briefs

Briefs, presentations, and dashboards are single HTML files with inline CSS/JS. Send them to anyone — no hosting needed. Use mill to export to PDF, CSV, or static sites.

Works in Any Tech Stack

Any repo, any stack, any language. If Claude Code can read it, wheat can research it. Evaluate your Scala migration, Python monorepo, or Flutter rewrite.

Frequently asked questions

Yes, Node 20 or later. But your project can use any language or framework — wheat works in any repo regardless of stack.

Does this require Claude Code?

Yes. wheat runs as a set of slash commands inside Claude Code. It uses Claude's ability to read your codebase, search the web, and reason about evidence.

How is this different from just asking Claude for a recommendation?

Asking Claude gives you a paragraph of plausible-sounding advice with no way to verify it. wheat gives you 10-30 typed claims, each with a named evidence tier, run through a compiler that catches contradictions and flags weak evidence. The output is a decision brief with a git audit trail — git log claims.json shows exactly how you got there. Every claim is traceable. Every conflict is surfaced. Not a prompt wrapper — a build system for decisions.

How is this different from planning tools like Obra or Superpowers?

Most planning tools generate a big plan upfront, then you execute it. wheat validates continuously: every finding is checked as it comes in, conflicts are caught immediately, and the compiler blocks output if your evidence doesn't hold up. Your understanding evolves as you learn, not before.

What's a "sprint"? Is this Agile?

No. A wheat sprint is a single investigation — one question, a set of findings, and a compiled output. It takes 15 minutes to an hour, not two weeks. Think make, not Jira.

A claim is a single finding from your investigation. Each one has a type — factual ("the API returns paginated results"), risk ("connection pooling may bottleneck"), constraint ("must support Postgres 14+"), estimate ("migration: 2-4 weeks"), or recommendation. Each claim also has an evidence grade so you know how much to trust it. The compiler validates them all before you can produce output.

A simple question can be answered in 10-15 minutes. Bigger decisions with prototyping and multiple rounds of challenge might take an hour or spread across a few sessions. orchard tracks dependencies when you're running multiple sprints in parallel.

The ecosystem

wheat is the core research engine. Add tools as you need them.

You're about to make a bigtechnical decision.Most teams wing it.

What is wheat?

Ask your question

Gather evidence

Ship the brief

"Should we consolidate the auth service back into our monolith?"

Define the question

Gather evidence

Build and measure

Stress-test the findings

Ship the decision

A decision-making framework built for engineers

Typed, Evidence-Graded Claims

Compiler Validation

20 Slash Commands

Git Audit Trail

Shareable Decision Briefs

Works in Any Tech Stack

The ecosystem

You're about to make a big
technical decision.
Most teams wing it.