Autonomous research orchestration framework. Give it a research goal, and it designs experiment campaigns, runs them, analyzes results, and documents novel discoveries.
Autolab builds on the autoresearch paradigm pioneered by Andrej Karpathy. Where autoresearch excels at optimizing a single metric in a tight loop, Autolab extends the idea to orchestrate research programs — multi-question investigations with campaign-based experiment design, literature-informed hypothesis generation, and rigorous discovery documentation.
Quickstart
pip install autolab # Initialize a research project autolab init "Optimize transformer inference latency on consumer GPUs" # Run a campaign autolab run campaigns/000_example.yaml # Check status autolab status # Query results autolab results --metric throughput --top 5 # Start autonomous research loop (requires API key) export ANTHROPIC_API_KEY=sk-... autolab loop --backend anthropic --max-iterations 10
How It Works
Research Directive (human)
|
v
Goal Decomposition --> Research Questions
|
v
Campaign Design --> YAML parameter grids
|
v
Experiment Execution --> Local, SSH, Docker, SLURM
|
v
Results Analysis --> SQLite DB, trend detection
|
v
Discovery Documentation --> DISCOVERIES.md with prior art verification
|
v
Repeat (with escape strategies when stuck)
Each research iteration follows the cycle:
- Orient — read journal, query DB, check progress
- Hypothesize — form testable question, write it down first
- Design — create campaign YAML targeting the hypothesis
- Execute —
autolab run campaigns/name.yaml - Analyze — query results, compare to baselines
- Document — update journal, add discoveries
- Commit — git commit the iteration
Campaign Format
version: 1 name: batch_size_sweep hypothesis: "Batch sizes >32 show diminishing throughput gains" question: q1 moonshot: false runner: backend: local # local | ssh command: "python train.py --batch-size {batch_size} --lr {lr}" working_dir: ./experiments timeout_seconds: 3600 defaults: lr: 0.001 grid: batch_size: [8, 16, 32, 64, 128] # 5 experiments (Cartesian product) metrics: primary: throughput direction: maximize collect: - name: throughput pattern: "Throughput: ([\\d.]+)" - name: loss pattern: "Loss: ([\\d.]+)" stopping: window: 3 threshold: 0.05 max_failures: 3
LLM Agent Support
Autolab is LLM-agnostic. The research loop can be driven by:
| Backend | How | Best for |
|---|---|---|
| Claude Code | Plugin with Ralph Wiggum loop | Richest: skills, hooks, native tools |
| Anthropic API | autolab loop --backend anthropic |
Direct API, any Claude model |
| OpenAI API | autolab loop --backend openai |
GPT-4o, o1, o3 |
| OpenAI-compatible | autolab loop --backend openai-compatible --base-url ... |
Ollama, vLLM, Together |
Claude Code Plugin
Install the plugin for the richest experience:
/research-loop Start autonomous research marathon
/campaign create Design a new campaign
/campaign run Run a campaign
/status Check progress
/literature Search prior art
/discover Document a finding
The plugin includes skills for research loop protocol, campaign design, and discovery writing, plus a Ralph Wiggum stop hook for long-running autonomous sessions.
Moonshot Budget
Autolab enforces that 50% of campaigns (configurable) should be moonshots — experiments that challenge fundamental assumptions rather than incrementally tweaking parameters. This prevents convergence on local optima.
# autolab.yaml strategy: moonshot_ratio: 0.5 # 50% default enforce: soft # soft | hard
Escape from Local Minima
When the agent hasn't improved for 3+ consecutive iterations, Autolab triggers escape strategies:
- Literature search — find untried approaches
- Devil's advocate — argue against current approach, try the opposite
- Random perturbation — explore far outside tested parameter ranges
- New question — pivot to a completely different angle
Discovery Attribution
Discoveries made with Autolab include attribution in DISCOVERIES.md:
Discovered with Autolab — autonomous research orchestration
See ATTRIBUTION.md for publication guidelines.
Project Structure
autolab/
├── src/autolab/
│ ├── core/ Campaign engine, research loop, plan, scheduler
│ ├── agents/ LLM backends (Anthropic, OpenAI, compatible APIs)
│ ├── runners/ Experiment execution (local, SSH)
│ ├── metrics/ Collectors, SQLite DB, trend analysis
│ ├── intelligence/ Literature search, escape strategies, discovery management
│ ├── state/ Project state, journal, git integration
│ └── scaffold/ Project initialization templates
├── plugin/ Claude Code plugin (commands, skills, hooks)
├── examples/
│ ├── ml-optimization/ Hyperparameter tuning
│ ├── distributed-inference/ Pipeline parallelism optimization
│ └── algorithm-design/ Sorting algorithm comparison
└── tests/ 118 tests
Examples
ML Optimization
cd examples/ml-optimization
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_lr_sweep.yamlDistributed Inference
cd examples/distributed-inference
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_stage_count.yamlAlgorithm Design
cd examples/algorithm-design
PYTHONPATH=../../src python3 -m autolab.cli run campaigns/001_algorithm_comparison.yamlDevelopment
git clone https://github.com/t8/autolab.git cd autolab pip install -e ".[dev]" pytest tests/ -v
Inspiration
Autolab stands on the shoulders of:
- Andrej Karpathy — whose autoresearch concept proved that LLMs can autonomously run meaningful experiments. Autolab extends this from single-metric optimization to multi-hypothesis research programs.
- Geoffrey Huntley — who created the Ralph Wiggum loop technique for keeping Claude Code running autonomously. Autolab's research marathon mode is built on this pattern.
License
Apache 2.0. See LICENSE and ATTRIBUTION.md.
