Settings

Theme

Evaluations are crucial, but what should you eval on?

github.com

2 points by draismaa 10 months ago · 1 comment

Reader

draismaaOP 10 months ago

LLM evaluations are tricky. You can measure accuracy, latency, cost, hallucinations, bias... but what really matters for your app? Instead of relying on generic benchmarks, build your own evals --> focused on your use case, and then, bring those evals into real-time monitoring of your LLM app. We open-sourced LangWatch to help with this.. How are you handling LLM evals in production?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection