GitHub - liliu-z/magpie: Multi-AI adversarial PR review tool

Multi-AI adversarial code review tool. Multiple AI models independently review your PR, debate their findings, then a code-aware verifier audits each issue against the actual codebase.

Core Concepts

Code-Aware Review: CLI-based reviewers (Claude Code, Codex, Gemini CLI) read the actual source files via tools — not just the diff text. They can grep for callers, read surrounding context, and verify their findings before reporting.
Multi-Dimensional Review: Beyond correctness/security, reviewers check compatibility (rolling upgrade risks, breaking changes), feature interaction (shared state, cross-feature conflicts), and extensibility.
Natural Adversarial: Different AI models naturally create disagreements and cross-validation through debate.
Integrated Verify+Audit: After issues are extracted, a tool-equipped verifier reads the actual code to confirm each issue, filter false positives, and re-calibrate severity — all within magpie's pipeline.
Fair Debate Model: All reviewers in the same round see identical information — no unfair advantage from execution order.
Parallel Execution: Same-round reviewers run concurrently for faster reviews.

Supported AI Providers

Provider	Type	Description
`claude-code`	CLI	Claude Code CLI (uses your subscription, no API key)
`codex-cli`	CLI	OpenAI Codex CLI (uses your subscription, no API key)
`gemini-cli`	CLI	Gemini CLI (uses Google account login, no API key)
`qwen-code`	CLI	Alibaba Qwen Code CLI (uses OAuth login, no API key)
`claude-*`	API	Anthropic API (requires ANTHROPIC_API_KEY)
`gpt-*`	API	OpenAI API (requires OPENAI_API_KEY)
`gemini-*`	API	Google Gemini API (requires GOOGLE_API_KEY)
`minimax`	API	MiniMax API (requires MINIMAX_API_KEY)
`mock`	Debug	Mock provider for testing (no API key, see Debug Mode)

Recommended: Use CLI providers (claude-code, codex-cli, gemini-cli, qwen-code) - they're free with your subscriptions and don't require API keys.

Custom API Endpoints

All API providers support custom base_url for connecting to compatible third-party services (Azure OpenAI, Ollama, vLLM, one-api, etc.):

providers:
  openai:
    api_key: ${OPENAI_API_KEY}
    base_url: https://my-ollama-server:11434/v1
  anthropic:
    api_key: ${ANTHROPIC_API_KEY}
    base_url: https://my-proxy.example.com

Installation

# Clone the repo
git clone https://github.com/liliu-z/magpie.git
cd magpie

# Install dependencies
npm install

# Build
npm run build

# Global install (optional)
npm link

Quick Start

# Initialize config file (interactive)
magpie init

# Or with defaults
magpie init -y

# Navigate to the repo you want to review
cd your-repo

# Start review (PR number)
magpie review 12345

# Or with full URL
magpie review https://github.com/owner/repo/pull/12345

# Start a discussion on any topic
magpie discuss "Should we use microservices or monolith?"

Configuration

Config file is located at ~/.magpie/config.yaml:

# AI Providers
providers:
  minimax:
    api_key: your-minimax-api-key   # or set MINIMAX_API_KEY env var
    base_url: https://custom-endpoint.example.com/v1  # optional: custom API endpoint

# Default settings
defaults:
  max_rounds: 5           # Maximum debate rounds
  output_format: markdown
  check_convergence: true  # Stop early when consensus reached
  language: en             # Output language (e.g., 'zh', 'en', 'ja')

# Reviewers - same perspective, different models
reviewers:
  claude:
    model: claude-code
    prompt: |
      You are a senior engineer reviewing this PR. Be precise and evidence-based.
      Review dimensions: Correctness, Security, Compatibility (rolling upgrade,
      breaking changes), Feature Interaction (shared state, cross-feature conflicts),
      Extensibility, Architecture, Performance & Resources.
      Use Read/Grep tools to verify findings against actual code.

  codex:
    model: codex-cli
    prompt: |
      # Same dimensions as above

# Analyzer - PR analysis (before debate)
analyzer:
  model: claude-code
  prompt: |
    Analyze this PR and provide:
    1. What this PR does
    2. Architecture/design decisions
    3. Affected interfaces/APIs (flag breaking changes)
    4. Compatibility risks (rolling upgrade, serialization changes)
    5. Feature interaction risks (callers, shared state)
    6. Suggested review focus (specific files + line ranges)

# Summarizer - final conclusion + verify+audit
summarizer:
  model: claude-code
  prompt: |
    You are a neutral technical reviewer. Based on the full reviewer discussion, provide:
    1. Points of consensus
    2. Points of disagreement
    3. Recommended action items
    4. Overall assessment

# Context Gatherer - system context before review (optional)
contextGatherer:
  enabled: true              # Enable/disable context gathering
  model: claude-code         # Optional: defaults to analyzer model
  callChain:
    maxDepth: 2              # How deep to trace call chains
    maxFilesToAnalyze: 20    # Max files to analyze for call chains
  history:
    maxDays: 30              # Look back period for related PRs
    maxPRs: 10               # Max related PRs to include
  docs:
    patterns:                # Doc files to include for context
      - docs
      - README.md
      - ARCHITECTURE.md
      - DESIGN.md
    maxSize: 50000           # Max total size of doc content

CLI Options

magpie review [pr-number|url] [options]

Options:
  -c, --config <path>       Path to config file
  -r, --rounds <number>     Maximum debate rounds (default: 5)
  -i, --interactive         Interactive mode (pause between turns, Q&A)
  -o, --output <file>       Output to file
  -f, --format <format>     Output format (markdown|json)
  --no-converge             Disable convergence detection (enabled by default)
  -l, --local               Review local uncommitted changes
  -b, --branch [base]       Review current branch vs base (default: main)
  --files <files...>        Review specific files
  --reviewers <ids>         Comma-separated reviewer IDs (e.g., claude-code,gemini-cli)
  -a, --all                 Use all configured reviewers (skip selection)
  --git-remote <remote>     Git remote for PR URL detection (default: origin)
  --skip-context            Skip context gathering phase
  --no-post                 Skip post-processing (GitHub comment flow)
  --no-conclusion           Skip final conclusion generation (for bot/CI use)
  --plan-only               Generate review plan without executing
  --reanalyze               Force re-analyze features (ignore cache)

  # Repository Review Options
  --repo                    Review entire repository
  --path <path>             Subdirectory to review (with --repo)
  --ignore <patterns...>    Patterns to ignore (with --repo)
  --quick                   Quick mode: only architecture overview
  --deep                    Deep mode: full analysis without prompts
  --list-sessions           List all review sessions
  --session <id>            Resume specific session by ID
  --export <file>           Export completed review to markdown

Discuss Command

magpie discuss [topic] [options]

Options:
  -c, --config <path>       Path to config file
  -r, --rounds <number>     Maximum debate rounds (default: 5)
  -i, --interactive         Interactive mode (follow-up Q&A after conclusion)
  -o, --output <file>       Output to file
  -f, --format <format>     Output format (markdown|json)
  --no-converge             Disable convergence detection
  --reviewers <ids>         Comma-separated reviewer IDs
  -a, --all                 Use all configured reviewers
  -d, --devil-advocate      Add a Devil's Advocate to challenge consensus
  --list                    List all discuss sessions
  --resume <id>             Resume a discuss session with follow-up question

Reviewer Selection

By default, Magpie prompts you to select reviewers interactively:

# Interactive selection (default)
magpie review 12345

# Select reviewers from config:
#   1. claude-code
#   2. codex-cli
#   3. gemini-cli
# Enter numbers separated by commas (e.g., 1,2): 1,3

You can also specify reviewers directly:

# Use all configured reviewers
magpie review 12345 --all
magpie review 12345 -a

# Specify reviewers by ID
magpie review 12345 --reviewers claude-code,gemini-cli

Review Modes

# Review a GitHub PR (number or URL)
magpie review 12345
magpie review https://github.com/owner/repo/pull/12345

# Review local uncommitted changes (staged + unstaged)
magpie review --local

# Review current branch vs main
magpie review --branch

# Review current branch vs specific base
magpie review --branch develop

# Review specific files
magpie review --files src/foo.ts src/bar.ts

Repository Review

Review an entire repository with feature-based analysis:

# Full repository review (interactive)
magpie review --repo

# Quick stats only
magpie review --repo --quick

# Deep analysis (no prompts)
magpie review --repo --deep

# Review specific subdirectory
magpie review --repo --path src/api

# List/resume sessions
magpie review --list-sessions
magpie review --session abc123

# Export completed review
magpie review --export review-report.md

Repository review includes:

AI-powered feature detection (identifies logical modules)
Session persistence (pause/resume reviews)
Focus area selection (security, performance, architecture, etc.)
Progress saving between runs

Topic Discussion

Discuss any technical topic with multiple AI reviewers through adversarial debate:

# Basic discussion
magpie discuss "Should we use microservices or monolith for our new project?"

# From a file (supports markdown)
magpie discuss /path/to/architecture-proposal.md

# With Devil's Advocate to challenge consensus
magpie discuss "Is Kubernetes overkill for our scale?" -d

# Interactive mode for follow-up Q&A
magpie discuss "How should we handle database migrations?" -i

# List all discuss sessions
magpie discuss --list

# Resume a previous discussion with follow-up
magpie discuss --resume abc123 "What about rollback strategies?"

Discussion features:

Multi-perspective analysis: Different AI models debate the topic from their unique viewpoints
Devil's Advocate mode (-d): Adds a dedicated contrarian to stress-test ideas
Session persistence: Save/resume discussions for multi-session deep dives
Language matching: Automatically responds in the same language as your topic (Chinese/English)
Interactive follow-up: Continue the discussion with additional questions
Project context: Optionally loads project-specific context for relevant discussions

Workflow

1. Context Gathering (if enabled)
   │  Collects: affected modules, related PRs, call chains
   │  Supports: Go, C++, Python, Java, Scala, TS/JS, Rust, Proto
   ↓
2. Analyzer analyzes PR
   │  Outputs: summary, interface changes, compatibility risks,
   │           interaction risks, specific review focus areas
   ↓
3. [Interactive] Post-analysis Q&A (ask specific reviewers)
   ↓
4. Multi-round debate
   ├─ Round 1: All reviewers give INDEPENDENT opinions (parallel)
   │           CLI reviewers fetch diff + read code via tools
   │           ↓
   ├─ Convergence check: Did reviewers reach consensus?
   │           ↓
   ├─ Round 2+: Reviewers see ALL previous rounds (parallel)
   │            Cross-validate findings, challenge weak arguments
   │            ↓
   └─ ... (repeat until max rounds or convergence)
   ↓
5. Structurizer extracts issues into structured JSON
   ↓
6. Verify+Audit (tool-equipped)
   │  For each issue: Read/Grep actual code to verify
   │  Filters: false positives, by-design patterns, pre-existing issues
   │  Re-calibrates severity based on evidence
   ↓
7. [Optional] Summarizer produces final conclusion (--no-conclusion to skip)

Fair Debate Model

Magpie uses a fair debate model where:

Round 1: Each reviewer gives their independent opinion without seeing others
Round 2+: Each reviewer sees ALL previous rounds' messages
Same-round fairness: All reviewers in the same round see identical information
Parallel execution: Same-round reviewers run concurrently (faster reviews)

This ensures no reviewer has an unfair advantage from execution order.

Features

Context Gathering

Before the review begins, Magpie automatically gathers system-level context to help reviewers understand the broader impact of changes:

Affected Modules: Identifies which parts of the system are impacted (core, moderate, low)
Related PRs: Finds relevant past PRs from project history
Call Chain Analysis: Traces how changed code connects to the rest of the system (supports Go, C++, Python, Java, Scala, TypeScript, Rust, Proto)

┌─ System Context ─────────────────────────────────────────┐
│ Affected Modules:                                        │
│   • [core] src/orchestrator - Main review orchestration  │
│   • [moderate] src/config - Configuration handling       │
│                                                          │
│ Related PRs:                                             │
│   • #42 - Added streaming support                        │
│   • #38 - Refactored provider interface                  │
└──────────────────────────────────────────────────────────┘

Use --skip-context to disable, or configure in contextGatherer section of config.

Session Persistence

Reviewers that support sessions maintain context across debate rounds, reducing token usage.

Provider	Session Support	Notes
`claude-code`	Yes	Full session with explicit ID
`codex-cli`	Yes	Full session with explicit ID
`qwen-code`	Yes	Full session with explicit ID
`minimax`	Yes	Conversation history maintained
`gemini-cli`	No	Uses full context each round
Other API providers	No	Uses full context each round

Parallel Execution

All reviewers in the same round execute concurrently. Results are collected and displayed after all reviewers complete:

⠋ Round 1: All reviewers thinking (parallel)...
   ↓ (all reviewers running simultaneously)
[claude-code]: First review...
[gemini-cli]: First review...
   ↓
⠋ Checking convergence...
   ↓
⠋ Round 2: All reviewers thinking (parallel)...

Post-Analysis Q&A (Interactive Mode)

In interactive mode (-i), after analysis you can ask specific reviewers questions before the debate begins:

magpie review 12345 -i

# After analysis...
💡 You can ask specific reviewers questions before the debate begins.
   Format: @reviewer_id question (e.g., @claude What about security?)
   Available: @claude
   Available: @gemini
❓ Ask a question or press Enter to start debate: @claude What about the error handling?

Convergence Detection

Enabled by default. Automatically ends debate when reviewers reach consensus on key points, saving tokens.

# Convergence detection enabled by default
magpie review 12345

# Disable convergence detection
magpie review 12345 --no-converge

Set defaults.check_convergence: false in config to disable by default.

Markdown Rendering

All outputs (analysis, reviewer comments, final conclusion) are rendered with proper markdown formatting in terminal - headers, bold, tables, code blocks all display correctly.

Token Usage Tracking

Displays token usage and estimated cost after each review:

── Token Usage (Estimated) ──
  analyzer       88 in     438 out
  claude      4,776 in   1,423 out
  gemini      6,069 in     664 out
  summarizer    505 in     322 out
──────────────────────────────────
  Total      11,438 in   2,847 out  ~$0.1429

Cold Jokes

While waiting for AI reviewers, enjoy programmer jokes:

⠋ claude is thinking... | Why do programmers confuse Halloween and Christmas? Because Oct 31 = Dec 25

Post-Review Discussion Phase (Interactive Mode)

In interactive mode (-i), after the debate concludes, you can enter a discussion phase to chat with any role (reviewers, analyzer, or summarizer) before the comment posting step:

Pick any role by number to start a conversation
Each role maintains a persistent session with full PR context and its original review analysis
Use /skip to exit the entire discussion phase
Useful for clarifying issues, asking follow-up questions, or getting deeper insights before deciding which comments to post

  Available roles:
    [1] claude-code
    [2] gemini-cli
    [3] analyzer
    [4] summarizer

  Pick a role by number (or Enter to exit discussion):

Post-Processing (PR Review)

After the debate concludes, Magpie extracts structured issues and lets you review them one by one:

Comment style prompt: Before the issue loop, you can provide style instructions (e.g., "be concise", "use Chinese") that apply to all generated comments
Progress tracking: Shows running tally of posted/edited/discussed/skipped issues
Per-issue actions:
- Post (p) — Posts as an inline comment on the exact PR line
- Edit (e) — Edit the comment before posting
- Discuss (d) — Start a multi-turn discussion with any role (reviewer/analyzer/summarizer)
- Skip (s) — Skip this issue
- Quit (q) — Stop processing remaining issues
/skip and /drop: During discussion, type /skip or /drop to abandon the current issue
Inline comments: Each issue is posted as an individual inline comment on the specific line in the PR diff. Falls back to a regular PR comment if the line is not in the diff.
Auto-explain: When you choose to discuss, the reviewer automatically explains the issue in detail first (where the problem is, why it's a problem, how to fix it) before you start asking questions.
Comment regeneration: After discussion, the reviewer generates a revised comment. You can post it, post the original, edit, regenerate with new instructions, or skip.
--no-post: Use this flag to skip the entire post-processing flow and just see the review output.

Debug Mode

Use the mock provider to test Magpie workflows without real AI calls:

# Enable mock mode globally (all models become mock)
# In config: mock: true

# Or use mock as a model name
# reviewers:
#   test-reviewer:
#     model: mock
#     prompt: "test prompt"

# Environment variables
MAGPIE_MOCK_RESPONSE="fixed response text"   # Return fixed text
MAGPIE_MOCK_FILE=/path/to/response.txt       # Return content from file
MAGPIE_MOCK_DELAY=100                         # Delay between words in ms (default: 50)

# Example: test the discussion flow quickly
MAGPIE_MOCK_DELAY=50 magpie review 123 --reviewers test-reviewer

Development

# Run in dev mode
npm run dev -- review 12345

# Run tests
npm test

# Build
npm run build

License

ISC