Ilya Sutskever, the Scaling Hypothesis, and the Art of Talking Your Book

10 min read Original article ↗

If you’ve just raised $3 billion to build a new god, it helps to question the faith in the old one.

Ilya Sutskever’s recent appearance on the Dwarkesh Podcast has sparked predictable reactions. Skeptics seized on his statement that “we are back in the age of research” as vindication that the AI hype is overblown. Boosters dismissed the interview as sour grapes from an OpenAI exile. Both camps miss something important: Sutskever is doing what any rational actor in his position would do. He’s talking his book.

But to understand why that matters, we need to understand what he’s actually claiming and the history behind it.

The scaling hypothesis is the foundational bet that powered the modern AI boom. In simple terms: if you make neural networks bigger, train them on more data, and throw more compute at them, they get better.

In 2020, researchers at OpenAI (including Sutskever’s colleagues Jared Kaplan and Sam McCandlish) published a landmark paper demonstrating that language model performance follows power laws. Double your compute, and your model’s loss drops by a predictable amount. The relationship held across seven orders of magnitude. Basically, capabilities scale with compute and data.

This insight transformed AI from a research discipline into an infrastructure race. It explained why companies began raising billions for GPU clusters and why data companies like Scale AI suddenly became worth billions of dollars.

This “scaling hypothesis” justified the massive capital expenditures that would have seemed insane a decade ago. And, despite the sceptics (more on that later), scaling remains a significant driver of cutting-edge model performance.

As of December 3, 2025, Google’s Gemini 3 model is the best publicly available model. And, as Sutskever mentions in the interview, the gains in capabilities are due to improvements in pre-training. So the Scaling Hypothesis may not be dead yet (more on this later).

Note: In this post, I use ‘scaling’ and ‘pre-training’ somewhat interchangeably. This is not accurate, but it is sufficient for this post.

Enter Reinforcement Learning

Pre-training, where models learn to predict the next word in a sequence, was the original scaling recipe. But it has a constraint: you need data. The internet is large but finite. At some point, you run out of high-quality text to train on.

Reinforcement learning (RL) offered a second scaling axis. Instead of just predicting text, models could be trained to optimize for outcomes using techniques like RLHF (Reinforcement Learning from Human Feedback – one of many flavors of RL used in post-training). Human raters would compare model outputs and indicate preferences. A reward model would learn from those preferences. Then the language model would be fine-tuned to maximize the reward.

The reasoning model revolution extended this further. OpenAI’s o1 and DeepSeek’s R1 demonstrated that you could scale both inference-time and training-time compute. Let the model “think” longer, explore more reasoning chains, and performance improves. DeepSeek’s R1-Zero showed something remarkable: reasoning behavior could emerge purely from RL training, without any supervised fine-tuning.

So the industry developed a two-stage scaling playbook. Pre-train on massive datasets to build a foundation. Then apply RL to enhance reasoning, alignment, and task-specific performance.

What Sutskever Actually Said

In the Dwarkesh interview, Sutskever makes a nuanced argument that has been flattened by the discourse. He does not say scaling is dead. He says the original pre-training scaling recipe is reaching its limits because data is finite. And he questions whether simply scaling up 100x will be “transformative” for achieving superintelligence.

His actual quote: “Is the belief really, ‘Oh, it’s so big, but if you had 100x more, everything would be so different?’ It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don’t think that’s true.”

He explicitly distinguishes between useful AI and transformative AI. Current approaches, he says, will continue generating “stupendous revenue.” The capability overhang, the gap between what models can do and what has been commercially deployed, is real and valuable. But reaching superintelligence requires something different. Something we don’t yet know how to build.

His central concern is generalization. Today’s models, despite their impressive benchmark performance, generalize “dramatically worse” than humans. They oscillate between the same two bugs when fixing code. They score well on evals that may inadvertently mirror their training data. Sutskever believes the path to superintelligence runs through understanding and solving this generalization problem, not through brute-force scaling of current methods.

The November 2023 Backstory

To understand Sutskever’s current positioning, you need to understand November 2023.

On November 17, 2023, OpenAI’s board fired Sam Altman. The action was sudden. Altman learned of his removal minutes before it happened, via Google Meet, while watching a Formula 1 race in Las Vegas. The board’s terse statement said only that Altman had not been “consistently candid in his communications.”

Sutskever was at the center of this coup attempt. According to his deposition in the ongoing Musk v. OpenAI lawsuit (released in late 2025), he had been considering Altman’s removal for over a year. He authored a 52-page memo, at the request of independent board members, accusing Altman of “a consistent pattern of lying” and “pitting his executives against one another.” The memo was sent via disappearing messages because Sutskever feared retaliation.

The firing triggered chaos. Nearly 700 of OpenAI’s 770 employees threatened to quit. Microsoft, which had invested billions, applied intense pressure. Within five days, Altman was reinstated. Sutskever publicly expressed regret for his participation in the board’s actions.

But the damage was done. Sutskever’s influence at OpenAI evaporated. He departed in May 2024, announcing Safe Superintelligence Inc. the following month.

SSI: The Straight-Shot Lab

SSI was founded with a deliberately provocative premise. While OpenAI, Anthropic, and Google were building products, competing on benchmarks, and racing to deploy, SSI would focus on pure research aimed at directly creating safe superintelligence.

The pitch worked. In September 2024, SSI raised $1 billion at a $5 billion valuation from investors including Andreessen Horowitz, Sequoia Capital, and DST Global. By April 2025, a second round brought in another $2 billion at a $32 billion valuation. Alphabet and NVIDIA became investors. Google Cloud began providing TPU resources.

This is extraordinary for a company with no products and roughly 20 employees. The valuation rests almost entirely on Sutskever’s reputation. He is one of the most influential figures in the history of deep learning. He was the second author on AlexNet, the paper that sparked the modern deep learning revolution. He was a co-author on the original GPT papers. Investors are betting that if anyone can find a path to superintelligence that current approaches cannot reach, it’s him.

But the narrative is not without complications. In July 2025, co-founder Daniel Gross departed SSI to join Meta’s newly formed superintelligence lab. The move came after Meta’s failed attempt to acquire SSI outright. Sutskever took over as CEO, stating: “We have the compute, we have the team, and we know what to do.”

The Incentive Structure

Which brings us back to the Dwarkesh interview.

Sutskever has raised billions to pursue a research agenda that, by his own admission, does not yet exist in a proven form. SSI’s website describes its mission as building “the world’s first straight-shot SSI lab” with “one goal and one product: a safe superintelligence.”

When Dwarkesh asks what technical approach SSI will take, Sutskever demurs. He alludes to ideas about generalization. He references his aesthetic sense of how AI should work. But he offers no specifics. “We live in a world where not all machine learning ideas are discussed freely,” he says.

This creates a convenient rhetorical position. If scaling is sufficient for superintelligence, SSI has no reason to exist. OpenAI, Google, and Anthropic have more compute, more engineers, and more revenue to fund the race. SSI’s value proposition depends on the premise that scaling is necessary but not sufficient, that some additional research insight is required.

As Charlie Munger used to say: “Show me the incentive and I’ll show you the outcome.”

None of this means Sutskever is wrong. His track record commands respect. His concerns about generalization are legitimate and well-grounded. His observation that companies now spend more compute on RL than pre-training reflects real shifts in the field.

But his public statements should be read with the same scrutiny we apply to any founder positioning their company. When Satya Nadella talks about AI copilots, we understand he’s selling Microsoft products. When Sam Altman discusses AGI timelines, we note that shorter timelines favor his company’s valuation. Sutskever deserves the same treatment.

The Unanswered Question

Here’s what puzzles me about the SSI thesis.

If we’re truly back in the “age of research,” where breakthrough insights matter more than raw compute, then SSI’s massive war chest seems misallocated. Fundamental research historically happens in universities and small labs, not in organizations raising billions for infrastructure (Arvind Krishna, IBM CEO, makes this same point in a recent interview on The Verge’s Decoder podcast).

But if compute still matters, if whoever gets to the next paradigm first still needs massive clusters to prove it out, then SSI is in a strange position. It has raised enough to be a serious player but not enough to compete with the hyperscalers. And by Sutskever’s own framing, it’s not trying to compete on compute anyway.

So what exactly does SSI intend to do with $3 billion?

Sutskever’s answer in the interview is revealing. He argues that SSI’s compute position is better than it appears because competitors spend heavily on inference and product development. SSI’s research-only focus means more of its budget goes to actual experimentation. But this is a relative argument, not an absolute one. He’s essentially saying SSI can punch above its weight class, not that weight class doesn’t matter.

The alternative reading is less flattering. Perhaps SSI is a research hedge, a well-funded option on the possibility that Sutskever’s intuitions are correct. If he finds something, the valuation was cheap. If he doesn’t, investors got access to one of the field’s greatest minds for a few years. Either way, the money has been raised, and the narrative has been established.

What the Discourse Misses

The polarized reaction to Sutskever’s interview obscures what’s actually interesting about it.

He’s not saying LLMs are useless. He’s saying they generalize poorly compared to humans, and that gap matters if your goal is superintelligence. He’s not saying scaling doesn’t work. He’s saying scaling alone won’t be transformative at the next level. He’s not saying current approaches have no value. He’s saying they’ll generate massive revenue while falling short of the ultimate prize.

These are reasonable positions. They may even be correct. But they’re also exactly the positions you would expect from someone who has bet his reputation and $3 billion on a different path.

The AI discourse would benefit from holding both truths simultaneously: Sutskever might be right about the limits of scaling, and his public statements about those limits happen to serve his commercial interests. These are not mutually exclusive. They’re just how the world works.