GitHub - zkarimi22/autoresearch-anything

Set up an autonomous AI improvement loop for any project. Inspired by Karpathy's autoresearch.

Run the setup script, answer a few questions about your project, and it generates a setup.md that tells your AI coding agent exactly how to run an overnight improvement loop.

Quick start

npx autoresearch-anything

Or clone and run directly:

git clone https://github.com/zkarimi22/autoresearch-anything.git
cd my-cool-project                              # your project, not the cloned repo
node ../autoresearch-anything/init.js           # adjust the path to where you cloned it

What it does

The script asks you about your project:

What file(s) should the agent edit?
What metric are you optimizing?
How to run the eval and extract the score?
Any constraints or secondary metrics?

Then it generates:

setup.md — Complete instructions for your AI agent. Which files to edit, how to run eval, how to log results, and to never stop.
eval.js — A starter template for your evaluation script (optional).

How to use the output

Once you have setup.md (and your eval script is ready):

Open Claude Code (or any AI coding agent) in your project directory
Tell it: "Read setup.md and kick off a new experiment. Do the setup first."
Walk away

The agent will loop: edit code, commit, run eval, keep improvements, discard failures. Each experiment takes a few minutes — roughly 12/hour, 100 overnight.

The pattern

LOOP FOREVER:
  1. Edit the code
  2. git commit
  3. Run eval → get a score
  4. Score improved? Keep. Score worse? git reset.
  5. Log results. Repeat.

This works for anything measurable: system prompts, API performance, landing pages, test suites, config tuning, SQL queries.

Example session

$ npx autoresearch-anything

╔═══════════════════════════════════════════╗
║        autoresearch-anything              ║
║   Autonomous AI improvement loop setup    ║
╚═══════════════════════════════════════════╝

Briefly describe your project: AI agent that generates React components
What file(s) should the agent edit? (comma-separated): system-prompt.md
What's your metric called? (score): pass_rate
Should the metric go up or down? (up): up
What command runs your eval and prints the score? (node eval.js): node eval.js
What does the score line look like in stdout? (pass_rate: 85.3): pass_rate: 85.3
Track a secondary constraint? [y/N]: y
What's the secondary metric called? (cost): api_cost
How does the secondary metric appear in stdout? (api_cost: 1.23): api_cost: 0.12
Max time per experiment in minutes? (10): 10
Prerequisites to verify before starting? (none): npm install
Other files the agent should read for context? (README.md): README.md, pipeline.js
Files the agent must NOT modify? (eval.js): eval.js, test_cases/
Any additional rules or constraints? (or 'none'): none
Generate a starter eval.js template? [Y/n]: y

Created: setup.md
Created: eval.js (starter template — fill in your evaluation logic)

Done! Next steps:

  1. Fill in eval.js with your actual evaluation logic
  2. Open your AI coding agent (Claude Code, Codex, etc.) in this directory
  3. Tell it: "Read setup.md and kick off a new experiment. Do the setup first."
  4. Walk away

License

MIT