Settings

Theme

LLM Evals Are Just Tests. Why Are We Making This So Complicated?

cameronwestland.com

3 points by camwest 5 months ago · 2 comments

Reader

8organicbits 5 months ago

So, did the tests allow you to build a system that never confused existing features with new features? That seems like the problem statement, but I think I'm only seeing probabilistic testing.

  • camwestOP 5 months ago

    Never? No. Way less likely? Yes!

    In dev we do 100 consistency checks and get green. In CI we do 10.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection