Loading roadmap...
Learning Topics
This roadmap covers the following topics:
- ⚪ Classical Metrics Failed
- ⚪ BLEU and ROUGE Failure Cases
- ⚪ Human Eval Doesn't Scale
- ⚪ Cost-of-Being-Wrong Framework
- ⚪ Defining Cost Tiers
- ⚪ Eval as an Architecture Decision
- ⚪ Where LLM Judges Shine
- ⚪ Strengths With Evidence
- ⚪ Where LLM Judges Struggle
- ⚪ The Right Tool Decision
- ⚪ Judge vs. Metric vs. Pipeline
- ✅ Rubric Design and Criteria Decomposition
- ⚪ Atomic Criteria vs. Holistic Rubrics
- ⚪ Chain-of-Thought Scoring
- ⚪ Rubric Drift
- ⚪ Pointwise, Pairwise, and Reference-Based Modes
- ⚪ Pointwise Scoring and Its Biases
- ⚪ Pairwise Comparison and Position Bias
- ⚪ Reference-Based Judging
✅ G-Eval and Structured Output
- ✅ G-Eval (2026): Architecture and Variants
- ⚪ Token Probability Scoring
- ⚪ FActScoring and Fact Decomposition
- ⚪ Structured Output for Judges
- ⚪ Schema Design for Eval Payloads
- ⚪ Constrained Decoding and Tool-Use Patterns
- ⚪ Self-Preference and Verbosity Bias
- ⚪ Detecting Self-Preference
- ⚪ Verbosity Bias in Practice
- ⚪ Bias-to-Mode Mapping
- ⚪ Position Bias Measurement
- ⚪ Rubric Drift Over Time
⚪ The Hybrid Pattern: Extraction Plus Deterministic Rules
- ⚪ Why Split Extraction from Judgment
- ⚪ Designing the Fact Schema
- ⚪ Extraction Failure Modes
- ✅ Deterministic Scoring: Rules, Trees, and DAGs
- ⚪ Encoding Rubrics as Rules
- ⚪ Audit Trails and Reproducibility
- ⚪ When the Hybrid Pattern Is Overkill
- ✅ Datalog for Deterministic Scoring
⚪ Meta-Evaluation and Production
- ⚪ Meta-Evaluation: Testing the Judge
- ⚪ Human Correlation and Benchmark Suites
- ⚪ Adversarial Test Cases
- ⚪ Production: Cost, Latency, and Drift
- ⚪ Cost Architecture and Model Tiering
- ⚪ Drift Monitoring and Judge Maintenance
- ⚪ When LLM-as-Judge Is the Wrong Tool
⚪ Building Your Eval Stack
- ⚪ CI, Nightly, and Audit Pipeline Design
- ⚪ What Goes in Each Layer
- ⚪ The Eval Stack Decision Framework
- ⚪ Putting It Into Production
- ⚪ Eval as Living Infrastructure
- ⚪ From Demo to Defensible System
Community Feedback
Share your thoughts and rate this roadmap
Sign in to share your feedback and rate this roadmap
No comments yet. Be the first to share your feedback!