ARC-AGI-3

2 min read Original article ↗

The First Interactive Reasoning Benchmark

ARC-AGI-3 is the first interactive reasoning benchmark designed to measure human-like intelligence in AI. Launching March 25, 2026, it will include 1,000+ levels across 150+ environments that require agents to explore, learn, plan, and adapt. ARC-AGI-3 will display the most authoritative evidence of AI generalization to date.

ARC-AGI-3 uses video-game-like environments where agents must act across multiple steps to achieve long-horizon goals. The games provide no instructions, so players must explore and discover the rules to succeed. Each environment is hand-crafted and novel, so systems cannot memorize their way to success.

Every environment (100%) is human-solvable.

When testing AI, the question isn't whether it solves the environment, it's how efficiently it does so. We measure this through action efficiency: how many actions does it take to complete a goal? This shows how effective a test-taker (human or AI) is at converting environment information into a working strategy.

Humans do this well. AI does not.

Experience ARC-AGI-3

ARC-AGI-3 Developer Toolkit

The ARC-AGI-3 developer toolkit is a series of tools that allows you to play and interact with ARC-AGI-3 environments either locally (up to 2000 FPS), online, or via a hosted API. The toolkit is the best way to get started with research on ARC-AGI-3. See the documentation.

Developer Toolkit code preview

Create an environment to play LS20. Get started in a few lines of code.