GitHub - metawake/ragtune: EXPLAIN ANALYZE for RAG retrieval — inspect, debug, benchmark, and tune your retrieval layer

5 min read Original article ↗

RagTune

Go Version License Release

Debug, benchmark, and monitor your RAG retrieval layer. EXPLAIN ANALYZE for production RAG.

RagTune demo

QuickstartCommandsWhy RagTuneConceptsFAQ


I want to... Command
Debug a single query ragtune explain "my query" --collection prod
Run batch evaluation ragtune simulate --collection prod --queries queries.json
Set up CI/CD quality gates ragtune simulate --ci --min-recall 0.85
Detect regressions ragtune simulate --baseline runs/latest.json --fail-on-regression
Compare embedders ragtune compare --embedders ollama,openai --docs ./docs
Quick health check ragtune audit --collection prod --queries queries.json

Quickstart

# 1. Start vector store
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

# 2. Ingest documents
ragtune ingest ./docs --collection my-docs --embedder ollama

# 3. Debug retrieval
ragtune explain "How do I reset my password?" --collection my-docs

No API keys needed with Ollama (runs locally).

Already using PostgreSQL with pgvector?

Skip Docker entirely. Use your existing database:

ragtune ingest ./docs --collection my-docs --embedder ollama \
    --store pgvector --pgvector-url postgres://user:pass@localhost/mydb

ragtune explain "How do I reset my password?" --collection my-docs \
    --store pgvector --pgvector-url postgres://user:pass@localhost/mydb

Build Your Test Suite

# Save queries as you debug
ragtune explain "How do I reset my password?" --collection my-docs --save
ragtune explain "What are the rate limits?" --collection my-docs --save

# Run evaluation once you have 20+ queries
ragtune simulate --collection my-docs --queries golden-queries.json

Each --save adds the query to golden-queries.json.


What You'll See

explain — Debug a Query

Query: "How do I reset my password?"

[1] Score: 0.8934 | Source: docs/auth/password-reset.md
    Text: To reset your password: 1. Click "Forgot Password"...

[2] Score: 0.8521 | Source: docs/auth/account-security.md
    Text: Account Security ## Password Management...

DIAGNOSTICS
  Score range: 0.7234 - 0.8934 (spread: 0.1700)
  ✓ Strong top match (>0.85): likely high-quality retrieval

simulate — Batch Metrics

Running 50 queries...

  Recall@5:   0.82    MRR: 0.76    Coverage: 0.94
  Latency:    p50=45ms  p95=120ms

FAILURES: 3 queries with Recall@5 = 0
  ✗ "How do I configure SSO?"
    Expected: [sso-guide.md], Retrieved: [api-keys.md...]

💡 Run `ragtune explain "<query>"` to debug

Commands

Command Purpose
ingest Load documents into vector store
explain Debug retrieval for a single query
simulate Batch benchmark with metrics + CI mode
compare Compare embedders or chunk sizes
audit Quick health check (pass/fail)
report Generate markdown reports
import-queries Import queries from CSV/JSON

See CLI Reference for all flags and options.


CI/CD Quality Gates

# .github/workflows/rag-quality.yml
- name: RAG Quality Gate
  run: |
    ragtune ingest ./docs --collection ci-test --embedder ollama
    ragtune simulate --collection ci-test --queries tests/golden-queries.json \
      --ci --min-recall 0.85 --min-coverage 0.90 --max-latency-p95 500

Exit code 1 if thresholds fail. See examples/github-actions.yml for complete setup.

Regression Testing

Compare against a baseline to catch regressions before they reach production:

# Compare current run against baseline
ragtune simulate --collection prod --queries golden.json \
  --baseline runs/baseline.json --fail-on-regression

Output shows deltas for each metric:

BASELINE COMPARISON
Comparing against: 2026-01-15T12:00:00Z
─────────────────────────────────────────────────────────────
  Recall@5:    0.900 → 0.850  ↓ 5.6%  (REGRESSED)
  MRR:         0.800 → 0.820  ↑ 2.5%  (improved)
  Coverage:    0.950 → 0.950  = 0.0%  (unchanged)
  Latency p95: 100ms → 120ms  ↑ 20.0%  (REGRESSED)
─────────────────────────────────────────────────────────────

❌ REGRESSION DETECTED
   The following metrics decreased: [Recall@5, Latency p95]

Why RagTune?

RAG retrieval is a configuration problem: chunk size, embedding model, index type, top-k. Most teams tune by intuition. RagTune provides the measurement layer to make these decisions empirically, using standard IR metrics (Recall@k, MRR, NDCG) on your actual data.

What Matters Impact
Domain-appropriate chunking 7%+ recall difference
Embedding model choice 5% difference
Continuous monitoring Catches data drift before users do

RagTune vs. Other Tools

RagTune focuses on retrieval debugging, monitoring, and benchmarking, not end-to-end answer evaluation.

RagTune Ragas / DeepEval misbahsy/RAGTune
Focus Retrieval layer Full pipeline Full pipeline
LLM calls None required Required Required
Interface CLI (CI/CD-native) Python library Streamlit UI
Speed Fast (embedding only) Slow (LLM inference) Slow
CI/CD First-class Manual setup None

Use RagTune when: debugging retrieval, CI/CD quality gates, comparing embedders, deterministic benchmarks.

Use other tools when: evaluating LLM answer quality, you need answer_relevancy metrics.


Signs You Need This

Retrieval failures are silent. No error, no exception. Just gradually worse answers.

  • Users complaining about "wrong answers" but you can't reproduce it
  • No idea if that embedding change made things better or worse
  • Retrieval was "good" in dev, failing in production
  • You added documents but answers got worse
  • Can't tell if the LLM is hallucinating or retrieval is broken

If any of these sound familiar:

ragtune explain "the query that's failing" --collection prod

Installation

# Homebrew (macOS/Linux)
brew install metawake/tap/ragtune

# Go Install
go install github.com/metawake/ragtune/cmd/ragtune@latest

# Or download binary from GitHub Releases

Prerequisites: Docker (for Qdrant), Ollama or API key for embeddings.


Embedders

Embedder Setup Best For
ollama Local, no API key Development, privacy
openai OPENAI_API_KEY General purpose
voyage VOYAGE_API_KEY Legal, code (domain-tuned)
cohere COHERE_API_KEY Multilingual
tei Docker container High throughput

Vector Stores

Store Setup
Qdrant (default) docker run -p 6333:6333 qdrant/qdrant
pgvector --store pgvector --pgvector-url postgres://...
Weaviate --store weaviate --weaviate-host localhost:8080
Chroma --store chroma --chroma-url http://localhost:8000
Pinecone --store pinecone --pinecone-host HOST

Included Benchmarks

Dataset Documents Purpose
data/ 9 Quick testing
benchmarks/hotpotqa-1k/ 398 General knowledge
benchmarks/casehold-500/ 500 Legal domain
benchmarks/synthetic-50k/ 50,000 Scale testing
# Try it
ragtune ingest ./benchmarks/hotpotqa-1k/corpus --collection demo --embedder ollama
ragtune simulate --collection demo --queries ./benchmarks/hotpotqa-1k/queries.json

Documentation

Guide Description
Concepts RAG basics, metrics explained
CLI Reference All commands and flags
Quickstart Step-by-step setup guide
Benchmarking Guide Scale testing, runtimes
Deployment Patterns CI/CD, production
FAQ Common questions
Troubleshooting Common issues and fixes

Contributing

Contributions welcome. Please open an issue first to discuss significant changes.

License

MIT