The First Interactive Reasoning Benchmark
ARC-AGI-3 is the first interactive reasoning benchmark designed to measure human-like intelligence in AI. Launching March 25, 2026, it will include 1,000+ levels across 150+ environments that require agents to explore, learn, plan, and adapt. ARC-AGI-3 will display the most authoritative evidence of AI generalization to date.
ARC-AGI-3 uses video-game-like environments where agents must act across multiple steps to achieve long-horizon goals. The games provide no instructions, so players must explore and discover the rules to succeed. Each environment is hand-crafted and novel, so systems cannot memorize their way to success.
Every environment (100%) is human-solvable.
When testing AI, the question isn't whether it solves the environment, it's how efficiently it does so. We measure this through action efficiency: how many actions does it take to complete a goal? This shows how effective a test-taker (human or AI) is at converting environment information into a working strategy.
Humans do this well. AI does not.
Experience ARC-AGI-3
ARC-AGI-3 Developer Toolkit
The ARC-AGI-3 developer toolkit is a series of tools that allows you to play and interact with ARC-AGI-3 environments either locally (up to 2000 FPS), online, or via a hosted API. The toolkit is the best way to get started with research on ARC-AGI-3. See the documentation.
