Show HN: Agent-triage – diagnosis of agent failures from production traces
github.comI built agent-triage - a CLI that automates diagnosing AI agent failures in production.
I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.
You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.
It flags exactly which turn broke things, which agent caused it, and traces cascading failures across routing, handoffs, and retrieval.
It aggregates root causes across all of them: "24 out of 51 failures are missing escalations." You know exactly what to fix first.
Runs locally. Only LLM API calls leave your machine.
npx agent-triage demo — runs on sample data, uses your own API key (~$0.002/conversation with gpt-4o-mini).
https://github.com/converra/agent-triage
Demo report: https://demo-report-sigma.vercel.app/ Would love to try it out! Exactly the issues I’m facing. Try npx agent-triage demo — runs on sample data locally. Would love
to hear what you find when you point it at your own traces.