GitHub - exiw-ai/proofloop: AI orchestrator that runs until done. Define what "done" means, go to sleep, wake up to verified results. Supports OpenCode, Codex, Claude Code,

6 min read Original article ↗

Agents that run until done
Define what "done" means. Go to sleep. Wake up to verified results.

Python 3.12+ License CI Stars


Why Proofloop

Proofloop is an AI agent orchestrator that solves the main problem with coding agents on real-world tasks: long tasks require constant user-in-the-loop.

When a task takes more than an hour (often several hours), a regular agent becomes a project you need to babysit:

  • "continue", "try differently", "now run tests", "fix the regression"
  • Manual verification after every step
  • Lost context between sessions and iterations
  • Subjective "seems done" instead of proven results

Proofloop changes the paradigm: describe what you want (detailed or brief — your choice). During planning, the agent proposes assumptions, risks, a step-by-step plan, and verifiable conditions for "done". You review and approve — then the agent works autonomously until verified completion.


The Problem

Agents alone

You: "Migrate our monolith to microservices"

  Agent works... extracts user service...
- [usage limit]

- "Progress: user-service extracted.
-  TODO: orders, payments, gateway..."

  *Next day*

  You: "Continue migration..."
  Agent works...
- [usage limit]

- "orders-service done.
-  TODO: payments, gateway, tests..."

  *This goes on for a week*
  *Then integration bugs everywhere*

Proofloop

You: "Migrate our monolith to microservices"

Conditions:
  - All services pass health checks
  - Integration tests green
  - Zero downtime deployment works
  - "Data consistency verified across services"

+ *You go to sleep*

+ Agent works... extracts services
+ ✗ integration tests fail → retry

+ Agent works... fixes contracts
+ ✗ deployment failing → retry

+ Agent works... 47 iterations later
+ ✓ All conditions pass

+ "Done. 8 hours."

Quickstart

1. Install

curl -LsSf https://raw.githubusercontent.com/exiw-ai/proofloop/main/install.sh | sh

2. Setup provider (choose one)

Proofloop orchestrates existing AI agents — install whichever you prefer:

OpenCode
npm i -g opencode-ai@latest
opencode  # Interactive setup
Codex (ChatGPT Plus/Pro)
npm i -g @openai/codex
codex  # OAuth login
Claude Code
# Install: https://claude.com/download or npm i -g @anthropic-ai/claude-code
claude login

3. Run

proofloop run "Implement OAuth2 with Google, GitHub, and email/password auth" \
  --path ./my-project \
  --provider <provider>

Where <provider> is one of: opencode, codex, claude


CLI

╭──────────────────────────────────────────────────────────────────────────────╮
│                                                                              │
│  proofloop - agents that run until done                                      │
│                                                                              │
│  Global Options:                                                             │
│    -v, --verbose    Enable verbose output                                    │
│    -V, --version    Show version and exit                                    │
│    --help           Show this help message                                   │
│                                                                              │
│  proofloop run <description> -p <path> --provider <provider>                 │
│    Run a coding task autonomously.                                           │
│                                                                              │
│    Required:                                                                 │
│      -p, --path PATH           Workspace path                                │
│      --provider NAME           Agent: claude, codex, opencode                │
│    Options:                                                                  │
│      -y, --auto-approve        Skip interactive approvals                    │
│      -t, --timeout HOURS       Timeout (default: 4)                          │
│                                                                              │
│  proofloop task list                                                         │
│    List all tasks.                                                           │
│                                                                              │
│  proofloop task status <task_id>                                             │
│    Show task status. Accepts full UUID or 4+ char prefix.                    │
│                                                                              │
│  proofloop task resume <task_id>                                             │
│    Resume a stopped task.                                                    │
│                                                                              │
│  Examples:                                                                   │
│    proofloop run "Migrate to microservices" -p ./backend --provider claude   │
│    proofloop run "Add multi-tenancy" -p . --provider codex                   │
│    proofloop task resume a1b2 --provider opencode                            │
│                                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯

Features

  • Completion conditions — Automated (pytest, mypy, make build) or text-based ("API returns 200 for all endpoints")
  • No limits — Runs for hours, handles 50+ iterations, retries failures automatically
  • Fire and forget — Start before bed, wake up to verified results
  • Independent verification — Conditions checked by running actual commands, not agent self-assessment
  • Smart supervisor — Detects loops, stagnation, regressions; decides retry vs rollback vs stop
  • Multi-provider — Uses OpenCode, Codex, or Claude Code under the hood
  • Multi-repo — Coordinates changes across multiple repositories

Usage Examples

Full-stack feature with tests

proofloop run "Implement real-time notifications system with WebSocket server, \
  React hooks, PostgreSQL pub/sub, and comprehensive test coverage" \
  --path ./myapp \
  --provider <provider>

Database migration

proofloop run "Migrate from MongoDB to PostgreSQL: schema design, \
  data migration scripts, update all repositories and services, \
  ensure zero data loss" \
  --path ./backend -t 8 \
  --provider <provider>

Multi-repo refactoring

# ~/company/
# ├── api/        (Go backend)
# ├── web/        (React frontend)
# └── mobile/     (React Native)

proofloop run "Add end-to-end encryption for messages: \
  implement in API, update web and mobile clients, \
  add key rotation, write integration tests" \
  --path ~/company -t 6 \
  --provider <provider>

Legacy modernization

proofloop run "Convert jQuery frontend to React: \
  component architecture, state management with Zustand, \
  preserve all existing functionality, add TypeScript" \
  --path ./legacy-app -t 10 \
  --provider <provider>

Available providers

proofloop run "..." -p . --provider opencode  # OpenCode
proofloop run "..." -p . --provider codex     # Codex (ChatGPT)
proofloop run "..." -p . --provider claude    # Claude Code

Task management

proofloop task list                            # List all tasks
proofloop task status 550e                     # Check status (short ID)
proofloop task resume 550e --provider claude   # Resume stopped task

How It Works

flowchart TB
    subgraph You
        A[Describe task]
        F[Review & approve]
        K[Get results]
    end

    subgraph Proofloop
        B[Intake: analyze project]
        C[Inventory: discover checks]
        D[Plan: create steps]
        E[Conditions: define success]

        G[Delivery: execute plan]
        H[Verify: run all checks]
        I{All pass?}
        J[Supervisor: analyze failure]
    end

    A --> B --> C --> D --> E --> F
    F --> G --> H --> I
    I -->|No| J --> G
    I -->|Yes| K
Loading
Phase What happens
Intake Scans project structure, detects stack
Inventory Discovers tests, linters, type checkers
Plan Breaks task into implementation steps
Conditions Defines success criteria (automated + text-based)
You approve Review plan, adjust conditions, then approve
Delivery Agent executes all steps
Verify Runs every condition, collects evidence
Supervisor On failure: analyzes, decides retry/rollback/stop
Loop Repeats until all conditions pass or budget exhausted

Conditions

Automated — linked to commands:

  • pytest tests/ passes
  • make build succeeds
  • mypy --strict clean

Text-based — verified by agent each iteration:

  • "API handles 1000 req/s under load test"
  • "All UI components render without console errors"
  • "Database queries use indexes, no full table scans"

Docs


Development

git clone https://github.com/exiw-ai/proofloop.git
cd proofloop
make dev      # Install dev dependencies
make check    # Run all checks

See CONTRIBUTING.md for guidelines.


License

Apache 2.0 — see LICENSE.