Settings

Theme

Ask HN: How are you evaluating your LLMs in production?

2 points by ReDeiPirati 7 months ago · 1 comment · 1 min read


Hello HN! Which tools do you use to evaluate your LLMs and agents in production?

znpy 7 months ago

Sysadmin here ("cloud engineer" is what's in my contract).

> Which tools do you use to evaluate your LLMs and agents in production?

None for my work. I still use LLMs from time to time to generate boring terraform code or boring SQL queries, but I'm essentially not going to let some AI bs near the infrastructure I curate.

It's all fun and games until prod is down, or the cloud bill is 10x the previous month's bill (or both).

So unless I can blame it on the AI and take no responsibility I'm not going to let anything AI-powered near production.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection