GitHub - fsilavong/agent-eval: Your own eval engineer

Agent Eval is a skill for evaluating agentic AI pipeline systems at both the component level and end-to-end level. It helps you define what to measure, build or sample eval cases, run repeatable tests, track regressions over time, and turn results into grounded takeaways about what improved, what regressed, and what to change next.

What it offers

Install

Manual

git clone https://github.com/fsilavong/agent-eval.git ~/.claude/skills/agent-eval

Install with the Vercel Skills CLI:

npx skills add fsilavong/agent-eval