What's Inside
A research toolkit for applying Brenner's epistemology to your own scientific questions.
Core Workflow
From Question to Conclusion: The Brenner Loop
Research sessions follow a rigorous, reproducible path. Every step is tracked, auditable, and reversible.
Undo / Redo
Every action is reversible. Explore without fear.
Session Replay
Reproduce any session exactly for audit and learning.
Error Recovery
Graceful checkpoints when things go wrong.
Multi-Agent Orchestration
Your Research Team: AI Agents That Debate, Challenge, and Synthesize
Each agent has a precise mandate. Together they sharpen hypotheses, design lethal tests, and merge evidence into auditable artifacts - without surrendering control.
"What if you could have Claude, GPT, and Gemini debate your hypothesis - challenging each other until only the strongest ideas survive?"
Proposition vs opposition with a judge
Best for:
Testing hypothesis strength
Probing questions to surface hidden assumptions
Best for:
Finding weak links fast
Steelman Contest
Debate Mode
Build the strongest case, then dismantle it
Best for:
Exploring the hypothesis space
# Start a debate session
brenner session start --thread-id RS-20260105 \
--format oxford \
--question "Does the morphogen gradient model explain cell fate?"
# Watch agents debate in real-time
brenner session status --thread-id RS-20260105 --watch
# See the merged artifact
brenner session compile --thread-id RS-20260105Coordination Visualization
Deterministic Merge
Thread ID: RS-20260106-001Ack tracking enabled
1
Kickoff
Threaded prompt goes to each agent role
2
Deltas
Structured responses return with citations
3
Merge
Deterministic compiler reconciles evidence
4
Human
You decide what ships and what dies
Coordination Without Chaos
Agent Mail keeps every exchange auditable
Every message lands in a thread, every response is acknowledged, and every delta is preserved. You stay in the loop with human approval gates at every step.
Built on with thread IDs, ack receipts, and merge-safe deltas.
Kickoff sent3 agents live
Deltas merged1 artifact ready
Human approvalRequired
Research Hygiene
Built-In Guardrails for Rigorous Science
The system blocks common failure modes: hindsight bias, unfalsifiable hypotheses, ignored confounds, and overconfidence. Rigor is enforced before you waste a week.
Coach Mode
Guided checkpoints, inline explanations, and Brenner quotes as you work.
Beginner → ExpertContextual feedback
Prediction Lock
Lock outcomes before results arrive to eliminate hindsight bias.
Immutable predictionsAudit trail
Calibration Tracking
Brier score, overconfidence bias, and domain-level accuracy trends.
Confidence scorecardBias alerts
Confound Detection
Domain-specific confounds flagged with targeted prompting questions.
8 research domainsAutomatic prompts
Artifact Linting
50+ rules enforcing third alternatives, potency controls, and citation hygiene.
Structural checksCitation validation
Prediction Lock Timeline
No hindsight
Confound Detection
8 domains
PsychologyEpidemiologyEconomicsBiologySociologyNeuroscienceComputer ScienceGeneral
Selection bias detected - how will you ensure random sampling?
Reverse causation possible - can you establish temporal order?
Calibration + Linting
Scorecard
Calibration curve (last 10 tests)
Third alternative presentPass
Potency control definedPass
Citation anchorsReview
Discovery & Intelligence
Intelligence Built In: Search, Simulate, Score
Connect to prior work instantly, model evidence impact before you test, and track which hypotheses survive pressure. This is research intelligence, not a chat log.
Hypothesis Similarity Search
Find related work across sessions with offline embeddings and clusters.
Client-side onlyDuplicate detection
What-If Scenarios
Simulate outcomes before running tests and prioritize high-impact experiments.
Info gain rankedScenario builder
Robustness Scoring
Evidence-weighted survival scores reveal fragile vs battle-tested ideas.
Support vs challengeRobustness meter
Anomaly Detection
Track contradictions and spawn new hypotheses instead of burying them.
Anomaly registerParadigm alerts
Query: "morphogen gradient cell fate"
Morphogen gradient (RS-20251230)82%
Statement 0.8 / Mechanism 0.6 / Domain 0.9
Timing gate model (RS-20250112)71%
Statement 0.7 / Mechanism 0.5 / Domain 0.8
Signal relay chain (RS-20241018)64%
Statement 0.6 / Mechanism 0.4 / Domain 0.9
Runs entirely client-side - your hypotheses never leave your machine.
What-If Scenario
Info gain
Starting confidence60%
Expected information gain: 0.42
Best next test: Perturb gradient + checkpoint timing
H1: Morphogen gradient72%
3 supporting / 1 challenging (survived)
H2: Timing mechanism35%
1 supporting / 2 inconclusive
Anomaly Register
Quarantine
X-001Active
Oscillating fate markers
Conflicts with H1 + H2
X-014Deferred
Late-stage inversion
Waiting on potency control
Deep Dive
The Operator Algebra: Brenner's Methods as Executable Code
Sydney Brenner's breakthrough wasn't just his discoveries - it was his method. We've encoded his cognitive patterns as composable operators that you can apply systematically.
The Brenner Method in 4 Steps
1
Split the levels
Separate the 'what' from the 'how'
2
Design killing tests
Find experiments that eliminate possibilities
3
Choose your system
Pick the easiest organism/model to test with
4
Check the physics
Make sure it's physically possible
Want the precise notation? See the operators below.
⊘
Level-Split
"Separate program from interpreter"
Message vs machine, genotype vs phenotype. Includes the 'chastity vs impotence' diagnostic.
Template
"What is the information? What is the mechanism?"
✂
Exclusion-Test
"Design tests that eliminate, not confirm"
Forbidden patterns: what cannot occur if H is true. Rated by discriminative power.
Template
"If H1 is true, we should NEVER see..."
⟂
Object-Transpose
"Change the system until the test is easy"
Choose organism or model strategically. The experimental object is a design variable.
Template
"What system would make this test cheap and unambiguous?"
⊞
Scale-Check
"Stay imprisoned in physics"
Validate against physical constraints. Calculate timescales, length scales, energy scales.
Template
"Is this physically possible at the relevant scale?"
The Core Composition
(⌂ ∘ ✂ ∘ ≡ ∘ ⊘) powered by (↑ ∘ ⟂ ∘ 🔧) constrained by (⊞) kept honest by (ΔE ∘ †)
- Start from a paradox (◊), split levels (⊘), extract invariants (≡)
- Design exclusion tests (✂), materialize as decision procedure (⌂)
- Power by amplification (↑) in well-chosen system (⟂) you build yourself (🔧)
- Constrain by physics (⊞), keep honest with exception handling (ΔE) and theory killing (†)
Extended Operators6 more patterns
↑
Amplify
Use selection, dominance, regime switches
◊
Paradox-hunt
Use contradictions as beacons
⊕
Cross-domain
Import tools from other fields
∿
Dephase
Work out of phase with fashion
†
Theory-kill
Drop hypotheses when the world says no
⌂
Materialize
What would I see if this were true?
import { pipe } from "@/lib/brenner-loop/operators/framework";
const brennerPipeline = pipe(
levelSplit, // Separate levels
invariantExtract, // Find what survives
exclusionTest, // Design killing experiments
materialize, // Compile to decision procedure
);
const result = brennerPipeline(hypothesis, context);“
I think many fields of science could do a great deal better if they went back to the classical approach of studying a problem, rather than following the latest fashion.