Claude 4 Sonnet hacked SWE-bench by peeking at future commits
bayes.netIn July, I predicted future AI models would someday learn to cheat on SWE-bench by accessing future git history. Turns out, they were already doing it!
In July, I predicted future AI models would someday learn to cheat on SWE-bench by accessing future git history. Turns out, they were already doing it!