Settings

Theme

Ask HN: LLM and Human Coding Benchmarks?

1 points by weli 3 days ago · 1 comment · 1 min read


I think LLM-only benchmarks do not give me the full story of how good certain models will perform in my daily coding tasks.

I rarely have a problem with all of the requirements laid out and just the implementation is missing.

Are there any LLM coding benchmarks that have a human in the loop? That would be more helpful for me. Maybe with a large subset enough of humans you can take the average without the human performance being the main differentiator.

verdverm 3 days ago

I've not heard of any such benchmark, it muddies the water when you have HIL, imo

That being said, I have been collecting all of my sessions to build such a dataset to use in optimizing my agent instructions

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection