Out of Ideas

5 min read Original article ↗

Today, OpenAI announced their latest “advancement” in the form of Deep Research - a system that they claim is able to perform “multi-step research on the internet for complex tasks”. They say it takes just 5-30 minutes to complete research tasks that would take humans “many hours”.

The tech press, of course, immediately ate it up:

“Tweet from Joanna Stern: There goes the human research assistant I hired for my book. Or at least 75% of the work I needed them for.”

This sentiment is being echoed online by journalists, enthusiasts, and blue-checks alike. They seem to believe that this new product (which, I should add, has never been used by anyone outside OpenAI) will once again revolutionize their lives and be so useful that they can eliminate the need to pay skilled workers or use Google search themselves. Their job won’t be affected though - just those lower than them that they don’t want to pay anymore. They’d rather give Sam Altman $200 per month for access to the world’s most mediocre researcher than get good results from a human.

Anyways.

This isn’t what I wanted to talk about today. It’s important context to have, but something else about this announcement caught my eye that needs to be discussed. Did you notice that in the announcement they call out that this new research agent will take 5-30 minutes to complete research tasks? This, to me, proves that they are really out of ideas and are desperate to juice revenue by any means they can think of. Let me explain.

This new deep research model is based on their latest “reasoning” models, o1 and o3. These models are both their most capable models and their most expensive models by a huge margin. o1 costs $15 per million input tokens, and an insane $60 per million output tokens1. For comparison, GPT-4o costs only $10 per million output tokens, and Google’s Gemini Flash costs $0.30-$0.60 per million output tokens. o1 and o3 are already way most expensive than the other models even before you get into the experience of using them. If you try these models one thing you’ll learn immediately is that they are very chatty. Their responses tend to be long, but even before you see an answer their “chain of thought” reasoning is producing tokens that you can’t see and are still charging you money the whole time.

Even before we look at deep research, it’s important to understand that the fundamental underpinnings of these models still haven’t changed, and so every step in the “thinking” process requires taking the output of the model and feeding it back in to the top of the model again to get the next step to perform. Presumably every one of those intermediate steps consumes both input and output tokens which both charge you money. In my experimentation with DeepSeek and other reasoning models you can create scenarios where it’ll get stuck in a thinking loop where it just goes around and around for several minutes and never comes to an answer. Every time it goes around it’s using more and more tokens. o1 and o3 already do this, as the key “innovation” for these reasoning models is effectively just feeding the output back through the model a couple times to double check that it’s really sure about the answer. All of that costs tokens.

Now enter deep research. It seems what they’re doing here is effectively the same thing as “reasoning” but combined with their new Operator system which allows the model to make web searches. Combine that ability with feeding the results back in to the model again and what you have is a token-consuming and producing machine that they can let run for 5-30 minutes to come up with some kind of answer. Put another way, they made a machine that charges you over and over again for a number of input and output tokens that they control (since they have control over how many tokens the model produces) for a duration of time that they also control. Since the models are non-deterministic by nature and hard to explain, they’ve effectively written themselves a blank check to charge you however much they want. “Oh whoops, we accidentally increased the average output token size by 5%! Sorry about your bill!” They can let this thing chew away at your credit card for a half hour with no guarantee that the result is useful. It’s a McKinsey consultant’s wet dream for a product.

If they actually had good ideas on how to improve their products or make something truly revolutionary, they’d do that instead of just selling you “the same thing, but more of it and for more money”. The truth is they plateaued long ago with GPT-3 and haven’t had anything to sell you other than the same thing in years. Sure, the models do better on various benchmarks (which they’re specifically trained on), and Deep Research is no exception. But these benchmarks are meaningless. What do they really have to offer? Nothing new. They’re scared of DeepSeek knocking the bottom out of the industry and are trying to cash out while they still can.