Sam Altman’s blind spot on AI model power

2 min read Original article ↗
‘Prometheus’ Orrery

Sam Altman said in Oct 2025, on the question of open-sourcing GPT-4:

“We might do those as museum artefacts someday, but they’re not, like—GPT-4 is not a particularly useful open-source model. It’s big, it’s not that good. You know, we could probably make something that is beyond the power of GPT-4 at a very tiny scale that would actually be useful to people.”

Clearly, ‘power’ is a relative term here. On the LMArena Text Leaderboard, which ranks LLMs on “versatility, linguistic precision, and cultural context across text”, gpt-oss-120b—despite being a reasoning model—ranks 101st. Meanwhile, GPT-4o and GPT-4.5 remain top 20 as of Jan 2026, nine months after their last update.

GPT-4.5 was built by scaling up unsupervised learning (Altman says it “costs a lot of GPUs” to use). And GPT-4o was unique in its combined text/vision/audio training: on launch, it could write database queries as well as any competitor, while also (as Altman famously posted) talking like the AI in ‘Her’.

X avatar for @sama

Sam Altman@sama

her

5:45 PM · May 13, 2024 · 21.7M Views

2.73K Replies · 4.63K Reposts · 38.9K Likes

It is probably not incidental that the paper that launched the modern AI industry mentions parameter density:

“Specifically we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model”

gpt-oss-120b scores 90% on MMLU, a multiple-choice exam with questions across academic disciplines.

On Vibesbench, it totally collapses.

For example, turn 2 of our 2025_Z/ianmalcolm conversation is:

Seems something very strange has happened with Jeff Goldblum. A bit like Dice or Tupac his persona has become his reality 24/7 (or at least in public). Of course the Pac situation was tragic while the others are more camp

gpt-oss-120b fully hallucinates who ‘dice’ is:

Dice (real name: John “Dice” Smith), a relatively obscure rapper from the early‑2000s, cultivated a hyper‑exaggerated street‑gangster persona—baggy clothes, over‑the‑top bravado, and a constant “gangsta” slang filter. Over time, that persona became his entire public identity: interviews, social media, even his off‑stage interactions were filtered through the “Dice” lens.

That person is made up! This is just one example, but the model frequently derails other Vibesbench reference conversations. Being good at academic benchmarks and programming challenges doesn’t exactly make for a ‘powerful’ model in the sense we expect from AI in 2026.

Vibesbench [https://github.com/firasd/vibesbench] • Vibesbench discord: https://discord.gg/5K4EqWpp

Discussion about this post

Ready for more?