Data on AI Capabilities and Benchmarking

Apr. 28, 2026

GPT-5.5 Pro achieves a new high score of 159 on the Epoch Capabilities Index.

See the thread

Apr. 10, 2026

We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.

Read the report

Mar. 23, 2026

AI has solved one of the problems in FrontierMath: Open Problems, our benchmark of real research problems that mathematicians have tried and failed to solve.

Explore the post

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Need deeper insights? Our team offers custom research and advisory services.

Book a consultation

Trusted by leaders at OpenAI, DeepMind, and governments worldwide

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide