LLM Evals Are Just Tests. Why Are We Making This So Complicated?

3 points by camwest 8 months ago · 2 comments

Reader

So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing.

camwestOP 8 months ago

Never? No. Way less likely? Yes!
In dev we do 100 consistency checks and get green. In CI we do 10.

Settings

LLM Evals Are Just Tests. Why Are We Making This So Complicated?

Keyboard Shortcuts