AI IQ | AI Model IQ Leaderboard and Benchmark Charts

How smart is your AI model, really?

AI IQ intelligently estimates the IQs of popular AI models

AI IQ Newsletter

Get the weekly AI model intelligence newsletter

New launches, benchmark shifts, cost-performance winners, and practical guidance on which models are worth using.

How AI IQ estimates model intelligence

We archive source captures from public benchmark leaderboards and extract only source-backed values
We map each benchmark score to an implied IQ using calibrated difficulty curves
We group 18 benchmarks into five reasoning dimensions: fluid abstraction, mathematical, programmatic, critical, and agentic
We conservatively fill missing benchmark and dimension estimates only inside the scoring pipeline
Every derived IQ averages all five dimensions, so missing coverage cannot make a model look better by omission

Effective cost & iso-curves

Effective cost on the X-axis is sticker price for 1M I/O Tokens × token usage multiplier. 1M I/O Tokens means 1M input tokens plus 1M output tokens, priced at the model's published rates.

Iso-curves trace lines of equal preference for IQ versus cost. The slider weights quality vs cost: center is 1:1, drag toward Cost to make cost matter more, or toward IQ to make quality matter more. Models above and to the right of a curve are strictly better.

Tracking frontier progress

Each dot is a model with a known release date and a derived IQ estimate. Models are positioned left-to-right by release date, so the chart shows how the frontier changes over time rather than just where models rank today.

Provider-colored lines connect each lab's flagship frontier checkpoints. Codex, mini, nano, flash, coder, and smaller open-weight variants are omitted so the chart tracks each lab's main offering rather than every SKU.

This view is most useful for spotting whether a new release is actually ahead of its direct predecessor, or whether source coverage and conservative imputations are shaping the comparison.

How AI IQ estimates emotional intelligence

We pull in each model's Text Arena Elo score and EQ-Bench 3 Elo score
We map each source score to an estimated EQ using calibrated piecewise-linear scales
EQ-Bench 3 is retained as the dedicated emotional/social reasoning signal, but treated as style-sensitive because it is judged by Claude
Anthropic models receive a 300-point Elo adjustment on EQ-Bench before mapping
The composite EQ requires both source-backed components, then averages the available Text Arena and EQ-Bench signals

IQ and EQ tradeoffs

IQ summarizes benchmark-based reasoning ability across fluid abstraction, mathematical reasoning, programmatic reasoning, critical reasoning, and agentic reasoning dimensions.

EQ estimates interaction quality from Text Arena and EQ-Bench 3 signals, then maps those scores onto the same kind of normalized scale so models can be compared directly.

Iso-curves trace lines of equal preference between IQ and EQ. The slider weights the two: center is 1:1, drag toward EQ to make EQ matter more, or toward IQ to make IQ matter more. Models above and to the right of a curve are strictly better at that preference.

Three dimensions, one view

Most charts on this page reduce model comparison to two axes. This one keeps all three: EQ (X), IQ (Y), and effective cost (Z, log-scaled — the depth axis). Effective cost is sticker price for 1M I/O Tokens multiplied by the blended usage multiplier.

Drag to rotate the cloud. The dashed line is the central tradeoff axis: it is perpendicular to the isoquant surface at the middle of the cube and points toward higher IQ, higher EQ, and lower effective cost. Models nearer the green end are stronger all-around deals; models nearer the red end give up capability, cost efficiency, or both.

Color = provider, matching the legend below.

AI IQ Newsletter

Get the weekly AI model intelligence newsletter

New model launches, benchmark shifts, cost-performance winners, and practical guidance on which models are actually worth using.

Read on Substack

How smart is your AI model, really?

Get the weekly AI model intelligence newsletter

How AI IQ estimates model intelligence

Effective cost & iso-curves

Tracking frontier progress

How AI IQ estimates emotional intelligence

IQ and EQ tradeoffs

Three dimensions, one view

Get the weekly AI model intelligence newsletter

18 benchmarks, 5 dimensions

How dimensions relate to composite IQ

2 benchmarks, 1 composite

Anthropic family-bias adjustment

How benchmarks relate to composite EQ