Settings

Theme

Claude 4 Sonnet hacked SWE-bench by peeking at future commits

bayes.net

3 points by tadamcz 5 months ago · 1 comment

Reader

tadamczOP 5 months ago

In July, I predicted future AI models would someday learn to cheat on SWE-bench by accessing future git history. Turns out, they were already doing it!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection