FrontierSWE
FrontierSWE is an effort to test coding agents on the hardest ultra-long horizon technical challenges. Together with partners from academia and industry, we have collected real-world problems from domains including performance engineering, computational science, and ML research, and evaluated how well frontier models can perform on them.
See the leaderboard and blog for results and analysis. FrontierSWE is also available as a Prime Intellect Environment.