Settings

Theme

Regressions on benchmark scores suggest frontier LLMs ~3-5T params

aimlbling-about.ninerealmlabs.com

4 points by namnnumbr 11 days ago · 1 comment

Reader

namnnumbrOP 11 days ago

I tried backing out proprietary model sizes from benchmark scores, inspired by a Latent Space podcast where Artificial Analysis noted their Omniscience Accuracy numbers track parameter count better than anything else they measure.

I trained a bunch of simple linear regressions - while Omniscience Accuracy had the best fit (R2: 0.98), it predicted absurd multi‑trillion param sizes (Gemini 3 Pro ~1,254T total parameters). Artificial Analysis' Intelligence Index provided more plausible results:

Gemini 3 Pro: 3.4T Claude 4.5 Sonnet: 1.4T Claude 4.5 Opus: 4.1T GPT-5.x series in 2.9-5.3T range total parameters.

Interesting notes:

- task benchmarks (Tau²/GDPVal) aren't predictive of model size - adding price made the fit worse - sparsity or parameter activation ratios did not influence predicted sizes at all.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection