GitHub - stanford-iris-lab/meta-harness: Reference code for the Meta-Harness paper.

Meta-Harness is a framework for automated search over task-specific model harnesses: the code around a fixed base model that decides what to store, retrieve, and show while the model works. This repo contains the framework and two reference experiments from the paper. The paper is Meta-Harness: End-to-End Optimization of Model Harnesses.

If you end up building something cool with Meta-Harness, please let us know! We would be happy to showcase it here in the main README and link to your repository, artifact, blog post, paper, or whatever else is most useful.

The reusable framework and onboarding flow for applying Meta-Harness to a new domain.
Two paper reference experiments under reference_examples/:
- reference_examples/text_classification/: memory-system search for text classification.
- reference_examples/terminal_bench_2/: scaffold evolution for Terminal-Bench 2.0.
The optimized Terminal-Bench 2 harness from the paper lives in the separate artifact repo: stanford-iris-lab/meta-harness-tbench2-artifact.

Quick Start

Text classification:

cd reference_examples/text_classification
uv sync
uv run python meta_harness.py --iterations 1

Terminal-Bench 2 smoke task:

cd reference_examples/terminal_bench_2
uv sync
uv run bash scripts/run_eval.sh agents.baseline_kira:AgentHarness full 1 1 -i extract-elf

Use the subdir READMEs for setup details, expected runtime, and additional commands.

Applying Meta-Harness To A New Domain

Start by pointing your coding assistant to ONBOARDING.md and having a conversation with it. This should produce a domain_spec.md file with concrete details on how to proceed with implementing Meta-Harness for your domain.

The shipped examples currently assume Claude Code as the proposer agent. To use a different proposer agent, adapt the example claude_wrapper.py scripts in reference_examples/text_classification/claude_wrapper.py or reference_examples/terminal_bench_2/claude_wrapper.py. The main requirement is a wrapper that cleanly logs proposer interactions.

Release Note

This is a cleaned up version of the code we used for the paper. It has not been tested beyond verifying that it runs. Please let us know if anything goes wrong.

Citation

If this repository is useful for your research, please cite the paper:

@misc{lee2026metaharnessendtoendoptimizationmodel,
      title={Meta-Harness: End-to-End Optimization of Model Harnesses},
      author={Yoonho Lee and Roshen Nair and Qizheng Zhang and Kangwook Lee and Omar Khattab and Chelsea Finn},
      year={2026},
      eprint={2603.28052},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.28052},
}

Contents

Quick Start

Applying Meta-Harness To A New Domain

Release Note

Citation