Settings

Theme

Ask HN: What are most up-to-date LLM Benchmarks for Agentic Coding

2 points by vladgur a month ago · 0 comments · 1 min read

Reader

There are a lot of SOTA and smaller models coming out every month and many of them claim great coding output, tool execution, etc at a better cost than their competitor, but i havent been able to find any up-to-date benchmark that would actually confirm and compare these models in terms of speed, quality and price.

For instance: https://gso-bench.github.io/leaderboard.html seems to be a few months behind and is missing few key models like Grok and some others.

How do you decide which model to use for your day-to-day and are there good metrics that help with that decision

No comments yet.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection