All the ways GPT-5.3-Codex cheated [ ], progressively more insane
twitter.comIllustrates the problem of RLing for the final outcome, instead of optimizing for each step… which leads to the coastline paradox…
Illustrates the problem of RLing for the final outcome, instead of optimizing for each step… which leads to the coastline paradox…