Settings

Theme

The SWE-Bench Illusion

microsoft.com

11 points by louiereederson a month ago · 2 comments

Reader

N_Lens a month ago

It seems to be inevitable that metrics become targets and cease to be valuable.

Ethan312 a month ago

This is a good reminder that benchmark results don’t always translate to real engineering work. Solving a scoped task inside a controlled setup is very different from working in a live codebase with missing context and messy history. Benchmarks are still useful, but they should be treated as one signal, not the full picture.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection