GitHub - AlessioZazzarini/claude-codex-collab: Two AI coding agents. One orchestrator. Zero API costs.

Two AI coding agents. One orchestrator. Zero API costs.

A dead-simple system that makes Claude Code and OpenAI Codex CLI work together as a team — Claude as the PM, Codex as a second engineer. They debate architecture, delegate implementation, and cross-review code. All running on your existing subscriptions. No API keys. No third-party tools. No MCP servers. No tmux. Just bash, markdown, and slash commands.

You → Claude Code (PM + orchestrator)
       ├── Claude subagents (deep codebase work)
       └── codex exec via bash (independent implementation + fresh perspective)

Why this exists

Every developer using AI coding agents hits the same ceiling: one model, one perspective, one set of blind spots. Claude is great at architecture and nuanced reasoning. Codex is great at fast, focused implementation. But they don't talk to each other.

Until now, your options were:

Copy-paste between terminals (tedious, breaks flow)
Third-party orchestration tools (complex setup, another dependency)
MCP bridges and messaging bots (overengineered for the problem)
Just pick one and ignore the other

This system takes a different approach: Claude Code calls Codex directly via bash. That's the whole trick. codex exec runs headlessly and returns output to stdout. Claude reads it as a regular tool result. No infrastructure. No coordination layer. The filesystem and bash are the only "middleware."

What it actually does

Think — Two models debate your problem

You ask a question. Claude forms a position, then calls Codex to challenge it. They go back and forth for up to 2 rounds. Claude synthesizes both perspectives and presents a recommendation.

/collab Should we use event-driven architecture or cron-based scheduling for our background jobs?

Claude doesn't just relay Codex's response — it reasons about it, identifies where they agree, where they diverge, and who has the stronger argument. You get a synthesis neither model would produce alone.

Build — Claude architects, Codex implements, Claude reviews

Claude writes a structured spec. Codex builds it in the background (async — you keep chatting with Claude). When Codex finishes, Claude reviews the diff, runs tests independently, and reports results.

/collab Build a rate limiter utility with sliding window algorithm. Include tests.

The key insight: build tasks run asynchronously. Claude launches Codex in the background and checks on it periodically. You're never staring at a frozen terminal.

Debug — Competing hypotheses

Claude forms Hypothesis A about a bug. Codex forms Hypothesis B independently (Claude deliberately withholds its own hypothesis). If they converge — high confidence. If they diverge — Claude designs a discriminating test and the evidence decides.

/collab Users report intermittent 500 errors on the /api/generate endpoint. Debug it.

This is the scientific method applied to debugging: independent hypotheses, then experimentation.

Setup (5 minutes)

Prerequisites

Claude Code with an Anthropic subscription
Codex CLI with a ChatGPT subscription (Plus, Pro, or Enterprise)

# Install Codex CLI if you haven't
npm install -g @openai/codex

# Log in with your ChatGPT account (not an API key)
codex login

# Verify subscription auth
cat ~/.codex/auth.json | grep auth_mode
# Should show: "auth_mode": "chatgpt"

Install

Clone this repo or copy the files into your project:

# Option A: Clone and copy
git clone https://github.com/AlessioZazzarini/claude-codex-collab.git
mkdir -p your-project/.claude/bin your-project/.claude/commands
cp claude-codex-collab/collab-install/codex-bridge.sh your-project/.claude/bin/
chmod +x your-project/.claude/bin/codex-bridge.sh
cp claude-codex-collab/collab-install/collab.md your-project/.claude/commands/
cp claude-codex-collab/collab-install/collab-review.md your-project/.claude/commands/
mkdir -p your-project/.collab/specs your-project/.collab/reports

# Option B: Use the meta-prompt (recommended)
# Copy the collab-install/ folder into your project root
# Then paste the contents of META-PROMPT.md into Claude Code
# Claude Code will install everything and run verification tests

Add the collab section to your project's CLAUDE.md (see collab-install/CLAUDE-md-addition.md)
Add .collab/ to your .gitignore
If your project has an OPENAI_API_KEY in the environment (for embeddings, moderation, etc.), the bridge script already handles this — it unsets the key before calling Codex so your API account is never billed. But verify:

.claude/bin/codex-bridge.sh think "Run: echo OPENAI_API_KEY=\$OPENAI_API_KEY — report the output"

The key should be empty.

Verify

Run in Claude Code:

/collab What are the trade-offs between monorepo and polyrepo for a TypeScript project?

If Claude debates with Codex and presents a synthesis — you're done.

How it works under the hood

The bridge script (20 lines of bash)

.claude/bin/codex-bridge.sh does three things:

Unsets OPENAI_API_KEY (forces Codex to use subscription auth)
Disables OpenTelemetry (works around a known Codex CLI crash)
Calls codex exec with the right flags for each mode

Think mode: codex exec -s read-only "prompt" — Codex can read files but not modify anything.

Build mode: codex exec --full-auto "prompt" — Codex can create, modify, and run commands.

Add -m gpt-5.4 or any other model flag to pin a specific model (see Customization).

The slash commands (markdown files)

.claude/commands/collab.md teaches Claude Code the orchestration protocol:

How to detect think/build/debug mode from your request
How to format prompts to Codex (concise, structured, stateless)
How to run builds asynchronously (background process + PID polling)
How to synthesize cross-model output (summarize, don't relay)
Safety rules (never overlap files, always verify tests, compact context)

.claude/commands/collab-review.md is a lighter version for quick code reviews.

Async build pattern

# Launch in background
.claude/bin/codex-bridge.sh build "prompt" > .collab/codex-output.txt 2>&1 &
CODEX_PID=$!

# Claude stays interactive — you keep chatting

# Check periodically
kill -0 $CODEX_PID 2>/dev/null && echo "RUNNING" || echo "DONE"

# When done, read results
cat .collab/codex-output.txt

Claude estimates the wait time based on task complexity and checks automatically. You never manage this yourself.

Customization

Change the Codex model

Edit .claude/bin/codex-bridge.sh to pin a specific model with the -m flag:

# Pin a specific model
think) exec codex exec -m gpt-5.4 -s read-only "$PROMPT" ;;
build) exec codex exec -m gpt-5.4 --full-auto "$PROMPT" ;;

# Or use a reasoning model for harder problems
think) exec codex exec -m o3 -s read-only "$PROMPT" ;;
build) exec codex exec -m o3 --full-auto "$PROMPT" ;;

By default (no -m flag), Codex uses whatever model your subscription provides.

Adapt for your project

The slash commands reference generic conventions. Edit .claude/commands/collab.md to include your project's:

Test command (replace npm test with yours)
Framework rules (React, Vue, Rails, etc.)
Directory conventions
Code style requirements

Add more collaboration patterns

The system is just slash commands + a bash script. Add your own patterns by creating new .claude/commands/collab-*.md files. Ideas:

/collab-refactor — Claude and Codex propose competing refactoring approaches
/collab-test — Codex writes tests for code Claude just built
/collab-doc — Codex documents code Claude just built (fresh eyes = better docs)

Design decisions

Why Claude orchestrates, not a neutral third party: Claude holds your full session context — your codebase, your conversation history, your project conventions. Codex is the fresh pair of eyes. That asymmetry is a feature.

Why codex exec and not the Codex SDK: The SDK requires Node.js thread management. codex exec is one bash command. Claude Code already runs bash. Zero new infrastructure.

Why async for build but sync for think: Think mode needs immediate response for debate flow (15-30 seconds). Build mode can take minutes — blocking the terminal kills the UX. Background process + polling gives you both speed and interactivity.

Why max 2 debate rounds: Diminishing returns. If two rounds don't converge, the disagreement is usually fundamental and requires a human decision, not more AI debate.

Why specs for build tasks: Codex has no memory of your session. Without a structured spec (files to create, files NOT to touch, interfaces, constraints), Codex will make assumptions. Specs eliminate ambiguity.

Why unset OPENAI_API_KEY: Many projects have an OpenAI API key in the environment for embeddings or other services. Without unsetting it, Codex CLI silently uses the API key instead of subscription auth — and you get billed. The bridge script prevents this.

What this is NOT

Not an agent swarm framework. It's two agents and a bash script.
Not an MCP server. No protocol, no transport layer, no discovery.
Not a SaaS product. It's 4 files you drop into your project.
Not model-specific. Swap Claude for another CLI agent that can run bash. Swap Codex for any headless CLI agent.

Real-world results

Built and battle-tested on a production TypeScript/Next.js 14 platform with 685+ tests. The think mode produced genuine architectural insights that neither model surfaced independently. The build mode went from spec to working code with passing tests in under 2 minutes.

Contributing

This is intentionally minimal. PRs that add complexity will be politely declined. PRs that make the existing system more robust, more portable, or better documented are welcome.

Ideas worth exploring:

Adapting the bridge for other CLI agents (Aider, Gemini CLI, etc.)
Better context management for long collaboration sessions
Structured logging of cross-model debates for later review

License

MIT — use it, fork it, adapt it, ship it.

Built by Alessio Zazzarini. Inspired by the belief that the best AI workflows use multiple models, not just the biggest one.