GitHub - raullenchai/triplecheck: Three AI agents review your code — for free, on your hardware. Deep review, not shallow lint.

9 min read Original article ↗
 ████████╗██████╗ ██╗██████╗ ██╗     ███████╗ ██████╗██╗  ██╗███████╗ ██████╗██╗  ██╗
 ╚══██╔══╝██╔══██╗██║██╔══██╗██║     ██╔════╝██╔════╝██║  ██║██╔════╝██╔════╝██║ ██╔╝
    ██║   ██████╔╝██║██████╔╝██║     █████╗  ██║     ███████║█████╗  ██║     █████╔╝
    ██║   ██╔══██╗██║██╔═══╝ ██║     ██╔══╝  ██║     ██╔══██║██╔══╝  ██║     ██╔═██╗
    ██║   ██║  ██║██║██║     ███████╗███████╗╚██████╗██║  ██║███████╗╚██████╗██║  ██╗
    ╚═╝   ╚═╝  ╚═╝╚═╝╚═╝     ╚══════╝╚══════╝ ╚═════╝╚═╝  ╚═╝╚══════╝ ╚═════╝╚═╝  ╚═╝

Three AI agents review your code — for free, on your hardware.
Deep review, not shallow lint. Local LLMs = unlimited passes, zero cost.

Python 3.11+ License: MIT PRs Welcome

Comparison · Quick Start · How It Works · Configuration · Roadmap


  ┌──────────┐        ┌──────────┐        ┌──────────┐        ┌──────────┐
  │          │        │          │        │          │        │          │
  │ REVIEWER │──────► │  CODER   │──────► │  TESTS   │──────► │  JUDGE   │
  │          │        │          │        │          │        │          │
  │ finds    │        │ fixes    │        │ verifies │        │ scores   │
  │ bugs     │        │ code     │        │ nothing  │        │ quality  │
  │          │        │          │        │ broke    │        │ 0 — 10   │
  └──────────┘        └──────────┘        └──────────┘        └──────────┘
       │                                       │
       │              ◄── loop until clean ──  │
       └───────────────────────────────────────┘

Feature
💰 $0 API cost Run Qwen, DeepSeek, or Llama locally via vLLM / Ollama / LM Studio
🔀 Mix any models Local Qwen for reviewer + cloud Claude for judge — any combination
🔧 Real fixes, not lint Produces actual code patches, applied and verified each round
📦 Scan entire repos Auto-splits large codebases into review units, prioritizes by complexity
🌐 Language agnostic Python, Go, Rust, TypeScript, Java, and more
🧠 Multi-pass voting N review passes × different angles → vote to filter noise
🔌 Any LLM backend vLLM, Ollama, LM Studio, OpenRouter, DeepSeek, OpenAI, Claude

🏆 Comparison

How triplecheck stacks up against popular AI code review tools:

triplecheck CodeRabbit PR-Agent (Qodo) Sourcery Ellipsis
Open source ✅ MIT ❌ SaaS ✅ Apache-2.0 ❌ Freemium ❌ SaaS
Run locally / self-host
Use your own models ✅ Any LLM ❌ Fixed backend ⚠️ OpenAI/Anthropic/custom ❌ Fixed ❌ Fixed
$0 with local LLMs ❌ $24/mo ⚠️ Need API key ❌ $36/mo
Auto-fix code ✅ Coder agent writes patches ⚠️ One-click suggestions ❌ Suggestions only ❌ Suggestions only ✅ Implements fixes
Review → Fix → Test loop ✅ Multi-round ❌ Single pass ❌ Single pass ❌ Single pass ❌ Single pass
Judge / scoring ✅ 0–10 verdict
Multi-pass voting ✅ N passes, deduplicate
Layered review ✅ arch/interface/logic/security
CI test gate ✅ Auto-runs tests
Repo-wide scan ✅ Auto-split + resume ❌ PR-scoped ❌ PR-scoped ❌ PR-scoped ❌ PR-scoped
Tree-sitter dep graph ✅ Smart batching
GitHub PR integration 🔜 Roadmap
Incremental (diff-only) 🔜 Roadmap
PR summary 🔜 Roadmap
IDE extension 🔜 Roadmap ✅ VS Code ✅ VS Code
In-PR chat ✅ @coderabbit ✅ /ask
SAST integrations ⚠️ ruff/golint/eslint ✅ 40+ tools ⚠️ Limited
Learning from feedback

TL;DR — triplecheck has the deepest review engine (multi-round fix loop, voting, layered review, test gate, judge scoring) and is the only tool that runs 100% free on your own hardware. The gap is GitHub integration — coming soon.

🚀 Quick Start

Install

pip install triplecheck

# Optional: smart file grouping via tree-sitter dependency graph
pip install triplecheck[graph]

3 ways to run

🏠 Local (free)

# Start any OpenAI-compatible server
vllm serve Qwen/Qwen3-Coder

# Review!
triplecheck \
  --target ./my-project \
  --skip-ci

No API keys. No cost. Unlimited.

☁️ Cloud API

export DEEPSEEK_API_KEY=sk-...

triplecheck \
  --target ./my-project \
  --skip-ci

Fast setup, pay per token.

🔀 Hybrid (recommended)

# Local finds + fixes, cloud judges
triplecheck \
  --target ./my-project \
  --reviewer qwen-local \
  --coder qwen-local \
  --judge claude-opus \
  --skip-ci

Best quality at minimal cost.

See examples/ for complete config files: config.local.yml · config.hybrid.yml · config.cloud.yml

⚙️ How It Works

                          ┌─────────────────────────────────────────────────────────────┐
   Your Code              │                    Review Pipeline                           │
      │                   │                                                             │
      ▼                   │   ┌──────────┐    ┌──────────┐    ┌──────────┐              │
 ┌─────────┐              │   │ Reviewer  │───▶│  Coder   │───▶│  Tests   │──▶ Round N  │
 │ Discover│──▶ Batch ───▶│   │  (LLM)   │    │  (LLM)   │    │ (local)  │     │       │
 │  Files  │              │   └──────────┘    └──────────┘    └──────────┘     │       │
 └─────────┘              │        ▲                                │          │       │
                          │        └──────── more findings? ◀───────┘          │       │
                          │                                              converged?    │
                          │                                                    │       │
                          │                                              ┌──────────┐  │
                          │                                              │  Judge   │  │
                          │                                              │  (LLM)   │  │
                          │                                              └────┬─────┘  │
                          └───────────────────────────────────────────────────┼────────┘
                                                                             │
                                                                             ▼
                                                                     📄 Report (JSON + MD)
                                                                     Score: 8.5/10 ✅

The Loop

  1. Reviewer reads your code in batches, outputs structured findings (file, line, severity, fix suggestion)
  2. Coder receives each finding, writes the actual fix (full file output), or rejects false positives with reasoning
  3. Tests run automatically (pytest, go test, npm test, cargo test) — if they fail, the round stops
  4. Repeat until no new findings or max rounds reached
  5. Judge evaluates the entire session history and scores 0–10

Concepts

Concept What it is
Finding A single issue: file, line, severity, suggested fix
Batch A group of related files sent to the Reviewer in one call
Round One full Reviewer → Coder → Tests cycle
Session A complete review (multiple rounds until convergence)
Unit A logical module in scan mode (package/directory)
Scan Full repo review — splits into units, runs sessions, aggregates

🔧 Configuration

All config lives in config.yml — three sections:

# ── 1. Define available models ──────────────────────────────────────
models:
  qwen-local:
    provider: openai-compat
    base_url: http://localhost:8000       # vLLM / Ollama / LM Studio
    model: Qwen/Qwen3-Coder
    max_tokens: 16384                     # coder needs room for full files
    temperature: 0.1
  claude-opus:
    provider: claude-cli
    model: opus
  deepseek:
    provider: openai-compat
    base_url: https://api.deepseek.com
    model: deepseek-coder
    api_key_env: DEEPSEEK_API_KEY         # reads from environment variable

# ── 2. Assign roles → models (swap these freely) ───────────────────
assignments:
  reviewer: qwen-local                    # fast local model finds issues
  coder: qwen-local                       # fast local model writes fixes
  judge: claude-opus                      # strong model scores quality

# ── 3. Pipeline behavior ───────────────────────────────────────────
pipeline:
  max_rounds: 4                           # max review-fix iterations
  batch_max_lines: 800                    # lines per review batch
  severity_threshold: warning             # error | warning | suggestion
  auto_style_fix: true                    # run ruff/black after fixes

Providers

Provider Config key Compatible with
OpenAI-compatible openai-compat vLLM, Ollama, LM Studio, DeepSeek, OpenRouter, OpenAI, any /v1/chat/completions
Claude CLI claude-cli Claude Opus, Sonnet, Haiku via claude CLI
Codex CLI codex-cli OpenAI Codex via codex CLI

📦 Scan Mode

Review entire repositories — triplecheck auto-splits into logical units:

# Preview the plan (instant, no LLM calls)
triplecheck --target ~/work/big-repo --scan --plan-only --include "**/*.go"

# Review top 5 most complex modules
triplecheck --target ~/work/big-repo --scan --max-units 5 --skip-ci \
  --include "**/*.go" --exclude "vendor/*"

# Resume a crashed scan
triplecheck --target ~/work/big-repo --scan --resume <scan_id> --skip-ci

🎯 Review Modes

Three mutually exclusive strategies — pick one:

Single Pass (default) Multi-Pass + Vote Layered Review

One pass. Fast.

pipeline:
  review_passes: 1

Best for small projects.

N passes × different angles. Findings voted on. Noise filtered.

pipeline:
  review_passes: 3
  review_min_votes: 2
  only_high_confidence: true

Best with free local models — run 5 passes, let votes filter noise.

Each layer sees only relevant context. Non-overlapping coverage.

pipeline:
  review_layers:
    - architecture
    - interface
    - logic
    - security

Best for large codebases.

Stackable Enhancements

Enable on top of any review mode:

Feature Config What it does
🔍 Static Analysis static_analysis: true Pre-screens with ruff / golint / eslint
🧠 Cross-Round Knowledge knowledge_accumulation: true Extracts recurring patterns → injects into next round
🌳 Smart Grouping smart_grouping: true Tree-sitter dep graph batches related files together

🤖 Supported Models

Model Provider Speed Quality Notes
Qwen3-Coder openai-compat ⚡⚡⚡ ★★★★ Best free option. Set max_tokens ≥ 16384.
DeepSeek Coder openai-compat ⚡⚡⚡ ★★★★ Cloud API, very cheap
Llama 3.3 70B openai-compat ⚡⚡ ★★★ Needs ~40GB VRAM
Claude Opus claude-cli ★★★★★ Best as judge in hybrid setups
Claude Sonnet claude-cli ⚡⚡ ★★★★ Good all-rounder
GPT-4o openai-compat ⚡⚡ ★★★★ Via OpenAI or OpenRouter

Tip: Any model that serves /v1/chat/completions works. The table above is just what we've tested.

💡 Local LLM Tips

Tip Details
Recommended model Qwen3-Coder or DeepSeek-Coder for reviewer/coder roles
Token budget Set max_tokens ≥ 16384 for coder — it outputs full file contents
Thinking tags If your model emits <think>...</think>, triplecheck auto-strips them
NL fallback If JSON parsing fails, findings are extracted from natural language
vLLM flags --max-model-len 32768 --enable-prefix-caching for best throughput

📋 CLI Reference

Flag Description
--target PATH Project directory to review (required)
--config PATH Config file path (default: ./config.yml)
--reviewer MODEL Override reviewer model
--coder MODEL Override coder model
--judge MODEL Override judge model
--max-rounds N Max review rounds
--include PATTERN File glob include (repeatable)
--exclude PATTERN File glob exclude (repeatable)
--skip-tests Exclude test files from review
--ci-cmd COMMAND Custom test command
--skip-ci Skip test gate entirely
--batch-max-lines N Max lines per review batch
--output PATH Report output directory (default: ./reports/)
--scan Split repo into units and review each
--plan-only Show scan plan only, no LLM calls
--max-units N Review top N units by priority
--resume SCAN_ID Resume a previous scan

📄 Output

Reports are saved to ./reports/:

File Contents
<session_id>.json Full session state — all rounds, findings, fixes, verdict
<session_id>.md Human-readable report with findings table, fixes, test results, judge verdict
scan_<id>.json All unit sessions combined (scan mode)
scan_<id>.md Overview table, per-unit summaries, aggregate score (scan mode)

🔌 Adding a Provider

# triplecheck/providers/my_provider.py
from triplecheck.providers.base import BaseProvider

class MyProvider(BaseProvider):
    def review(self, files, prompt, **kwargs):
        ...  # → list[Finding]

    def fix(self, file, findings, prompt, **kwargs):
        ...  # → FixResult

    def judge(self, session, prompt, **kwargs):
        ...  # → Verdict

Then register in triplecheck/roles.pyPROVIDER_MAP and add model entries in config.yml.

🗺️ Roadmap

P0 — Next Up

  • GitHub PR integration — GitHub Action + post review comments via gh api, line-by-line annotations
  • Incremental diff-only review — parse git diff, send only changed lines + context to LLM (saves tokens, more precise)
  • PR summary / walkthrough — auto-generate a changelog-style summary for each review session

P1 — On Deck

  • GitHub Action template — drop-in .github/workflows/triplecheck.yml for any repo
  • SARIF output--format sarif for GitHub Code Scanning / Security tab integration
  • Repo-level config.triplecheck.yml auto-discovered in repo root
  • Ignore rules.triplecheck-ignore to suppress known false positives by pattern

P2 — Future

  • VS Code extension — trigger review from IDE, show findings inline
  • Web report viewer — interactive HTML report with filtering and navigation
  • GitLab / Bitbucket support — platform-agnostic PR integration
  • Semgrep integration — custom SAST rules alongside LLM review
  • Learning from feedback — track dismissed findings, auto-suppress recurring false positives

Have an idea? Open an issue or send a PR.