Evaluations are crucial, but what should you eval on?

2 points by draismaa 10 months ago · 1 comment

Reader

LLM evaluations are tricky. You can measure accuracy, latency, cost, hallucinations, bias... but what really matters for your app? Instead of relying on generic benchmarks, build your own evals --> focused on your use case, and then, bring those evals into real-time monitoring of your LLM app. We open-sourced LangWatch to help with this.. How are you handling LLM evals in production?

Settings

Evaluations are crucial, but what should you eval on?

Keyboard Shortcuts