Settings

Theme

Ask HN: What are some good benchmarks for different agent harnesses?

3 points by Bnjoroge a day ago · 1 comment · 1 min read


Other than terminal bench which doesnt quite map to my experience, what are some other benchmarks to see how different models do in different harnesses?

drewbitt a day ago

These all track harnesses

https://www.vals.ai/benchmarks/vibe-code

https://www.vals.ai/benchmarks/swebench

https://www.vals.ai/benchmarks/terminal-bench-2-1 (vals customized terminal bench 2.0)

https://artificialanalysis.ai/agents/coding-agents

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection