AI forecasting retrospective: you're (probably) over-confident

21 min read Original article ↗

Late last year, I published an article asking readers to make 30 forecasts about the future of AI in 2027 and 2030---from whether or not you could buy a robot to do your laundry, to predicting the valuation of leading AI labs, to estimating the likelihood of an AI-caused catastrophe.

But I did something slightly different with these questions. Instead of asking "how many math theorems will an AI solve?" I asked for a 90% confidence interval on the number of theorems solved. In this post I'll analyze the distributions of peoples's answers, because it's still too early to grade the correctness. Here's an example question, and its answer distribution:

“The best off-the-shelf AI system will have scored better than between X% and Y% of all other participants (with a 90% confidence interval) in a widely recognized competitive programming contest”

Now what you'll notice is that, regardless of what the answer ends up being in 2027, AT MOST HALF OF PEOPLE CAN BE RIGHT. Despite the fact that I asked people to give 90% confidence intervals, and anyone could have just given the range [0, 100], people decided to instead give really narrow ranges. (To say nothing of OpenAI's recent o3 results that likely score 90%+.)

This overconfidence is consistent across all questions, and it really highlights a growing worry I have: almost everyone (even people who read my website!) appears overconfident about where things are going---be it those saying AGI is coming and we're all going to be unemployed, or those who say that LLMs will never be able to do anything useful and this whole AI thing is a NFT-style fad.

In the remainder of this article I am going to talk about the specific details of questions in the forecast, and how people's answers compare. I also gave people the opportunity to write a short explanation of their reasoning (and asked permission to share this reasoning), and so I'll give some of the most interesting responses people gave here.

But first! If you have not yet taken the survey, I think you would get a lot more out of this post if you answer at least some of the questions on the forecast before continuing. Forecasting is the best way to check your beliefs, and I think future-you will appreciate having a record of what 2025-you thought. The remainder of this article will assume you have done this.

A question-by-question breakdown

I asked thirty questions, and will now go through them in turn and highlight a few of the more interesting answer distributions and responses people gave. Because it's been just a few months since I asked these questions, I'm not going to actually try and give resolutions to them yet. Maybe in 2026 I'll start to resolve questions early if the answers are clear.

Let's begin!

AGI Lab Valuation

The most valuable "AI Lab" (e.g., OpenAI, Anthropic, Mistral, DeepMind if it were spun out of Google) will be between $X and $Y billion USD (90% confidence interval)

This question has a lot of variance in the answers. (Note the log scale on the y-axis.) A few people are willing to put some probability on 20 trillion dollar valuations, and others on ~almost zero.

I was very surprised by how narrow many ranges were for this question. At least a few people seem confident enough in their predictions that they are willing to say they have a 90% confidence the most valuable labs will be worth between exactly 300 and 500 billion dollars. But like: this seems a very narrow range? (Although I like whoever was the person who put a range of 2 billion to 20 trillion.)

I thought it was interesting that the "middle" answer that falls within most people's confidence intervals for 2030 is actually lower than the middle answer for 2027. I can't explain that.

One of the more common comments I saw when people gave low answers were people saying "the tech [will] become commoditized" or people who just think that we'll just "have AI takeover by 2030" and so the labs will be worth nothing.

AI Lab Collapse

There is an X% chance that at least one of OpenAI, Anthropic, or DeepMind will be functionally dead

It was interesting to see that people basically don't think that anything will change much by 2027, but by 2030 people are much more confident something will go wrong with at least one of the labs. I tend to agree here. But there were several comments of the form: "railroads also failed competition will cause some to die", which seems basically right to me? But it's hard to say. 2027 is not far off, and dying is often slow (until it's not).

AI Workers

The revenue generated by "AI workers" across all jobs (e.g., programmers, lawyers, accountants, cooks, plumbers, teachers, etc.) in the prior year is between $X billion and $Y billion USD (90% confidence interval)

This question had a bunch of tighter guesses that surprised me. I was expecting even wider ranges in part because I have no idea how I'm going to even be able to get a good estimate of this in 2027.

Also perhaps regretable is that I capped the upper bound of the question at $3 trillion dollars, and by 2030 a good number of people thought it would be at (or exceed) that value. I think this is not unreasonable, and probably in retrospect I should have made the upper bound higher. I don't remember what my prediction here was (that'll be the focus of my next article, and I don't want to spoil it for myself).

AI Revenue Model

There is an X% chance that a highly-popular state-of-the-art AI system is supported by advertising

It's fascinating how people are confident this won't happen before 2027, but most people agree it will by 2030. I have nothing new to offer here. My favorite commentary on this was the person who said “god i really hope not”.

Recursive Self Improvement

There is a X% chance that most of the improvements in the best AI systems will be a direct result of the prior generation of AI systems (and not due to humans researchers)

I'm surprised how many people think this won't happen, but also think that AI labs will be valued at tens of trillions of dollars. In order for a company to be worth 10x more than any company is at this moment, and for that to happen in two years, I'd imagine some kind of self-improvement would be a necessary prerequisite.

Proving Theorems

Between X and Y famous unproven mathematical conjectures (inclusive, 90% confidence interval) have been proven by an AI system

Most people seem skeptical here, and many respondents seem to think that "zero" is a likely answer. I think this is probably right, and would put at least 10% probability on LLMs not advancing much more in the next two years.

This seems to be captured in the comments people gave here, saying "Barring a shift in how AI "reasons" I don't think this will happen. It will be interesting to be wrong here." and "LLMs are great at predicting what we already know and these things are by and large not great at novel reasoning."

Competitive Programming

The best off-the-shelf AI system will have scored better than between X% and Y% of all other participants (with a 90% confidence interval) in a widely recognized competitive programming contest

This is the question we're closest to resolving already. OpenAI's o3 model on CodeForces is now in the top 1% of humans. We haven't seen it compete yet, but I would be more surprised if in 2027 it doesn't score first place than anything else. (But I also wouldn't be surprised if it was "only" top 5%. You should have wide error bars!)

But maybe what surprised me most is the people who had, at the high end, a guess with less than top-50%. Even when I launched these forecasting questions, I think o1 would probably have been in the top 50% of some competitive programming contest. So they're probably just (confidently) wrong even as of when they made the prediction.

And to some extent I can't fault them. Things change so fast that it can be really hard to keep track of what's going on. If the last time you looked at this AI thing was early 2023 and played around with GPT-3.5 you'd (rightly) believe they were pretty bad at programming.

Displacing Jobs

The creation of "AI Employees" causes mass job displacement, resulting in an unemployment rate of between X% and Y% in the United States (90% confidence interval)

Putting answers around 3% basically means things stay as they are, and not much changes. Even in 2030, people put numbers basically at the same value. On one hand it doesn't surprise me that this is roughly where the median is, by and large "things will stay the same as they are" is a good guess. But there are a lot of people where 4% is outside of their 90% confidence interval. If any of these people were to be right, with 10%+ unemployment, that would be a huge deal.

Self Driving Trips

Self-driving cars have made between X million and Y million (90% confidence interval) trips in the prior year

This one I find fascinating. In particular, I expected more people to put "zero" within their confidence intervals, which seems entirely possible to me if something bad happens before 2027 and governments step in and make it illegal. I think there's a fairly low probability of this happening, but it's probably greater than 5% probable?

Someone also left my favorite comment of the entire survey on this one: "Maybe no self-driving cars in 2029 because [we're all] dead". Which, if you're someone who believes in Doom, I guess is a reasonable reason to put zero as an answer. (I suppose this does also tell you something about my audience.)

Robot Helper

There is a X% chance that I will be able to purchase an off-the-shelf robot for under USD 100,000 that can perform at least one household task that requires the ability to grasp and manipulate objects (e.g., wash dishes, make a meal, fold laundry)

It's at first maybe surprising that almost no one thinks we'll have this by 2027, but half of the people who answered this forecast said 10% unemployment in the US is possible. The commentary, unsurprisingly, explains why: "Reality is hard!" and "Robotics is really hard and making it cheap is even harder." Maybe this is true, but it feels like if we can solve novel math problems, we can just use this to solve robotics? I don't know.

Cost per word

An AI system that benchmarks at least as well as current state-of-the-art systems costs $X USD to $Y USD (90% confidence interval) to write a million output words (~10 books)

I think people here were far too pesimistic. Even just looking at the recent DeepSeek v3 (which doesn't quite reach OpenAI's 4o or Anthropic's 3.5-sonnet levels) but costs just $1 per million tokens, I think anyone who didn't include $1 in their answer is probably going to be wrong. I'll just quote someone who gave the answer I'd give here: "Costs of training and using a LLM has fallen DRAMATICALLY and capabilities have increased DRAMATICALLY in the last 2 years. I think it'll be rill [sic] cheap."

I basically agree with this: and I'm very surprised people didn't have guesses maybe 5-10x lower on average.

Reviewing AI Work

Between X% and Y% of humans are employed in jobs that mostly consist of reviewing/supervising the output of AI systems

Most people who put numbers on the lower end were again talking about non-white-collar jobs, and paired with the fact people think robotics will be hard, explains the generally low answers. But also: look at these people confident in 25%+!

This question will, I think, be hard for me to accurately evaluate. I also expected this would cause people to increase their margins of error.

Expert Performance

There is a X% chance that the best AI systems will out-perform PhDs and top experts in most problem-solving tasks

I think here I'll just highlight two conflicting comments that I think explain the main debate: one person said "AI is as smart as a dumb person; which I don't think smart people are fundamentally different from." and another said "2027 is right around the corner, and problem solving (expansive thinking) is still lagging far behind summarization or referencing materials (internal thinking)."

Open Source Release

The delay from the best "closed-source" model release to it being reproduced in an "open source" model will be between X and Y months (90% confidence interval)

This one is I think one of the hardest to predict given the number of factors at play that would impact what will happen. It also shows: the error bars here are generally pretty wide.

People's answers are also wildly different in why they made their predictions: ranging from "It will catch up because the closed source race is slowing down", to "Larger and larger resources are needed to train models, including increasingly rarefied training data..." to "If inference is the way forward then open weight LLMs become harder to create and not easier bc they require more compute."

Playing Board Games

There is a X% chance that an off-the-shelf AI system can, if provided the rules to an arbitrary turn-based board game, play roughly as well as a casual player of the game

I found the bimodal distribution of answers here interesting. Most people are either very confident that it won't happen or very confident that it will. The commentary reflects the different reasons you could have here, with some people commenting on the fact that it's "hard in practice to extend AI to new domains" but others just saying "This probably already can happen." (You also get a mix of people saying "Most casual players are bad at games LOL even though I don't really trust LLM ability at novel challenges")

Agents

There is a X% chance that the majority of consumer AI applications will operate autonomously on multi-step tasks rather than by just answering user questions as chatbots

Seems like people don't really believe in the "year of the agent" as being pushed by the large tech companies. I don't blame them, and taking the human out of the loop seems super dangerous to me. But when has the potential harm of a technology ever stopped it from being developed? So maybe it'll happen to everyone's dismay.

Power Consumption

Between X% and Y% (90% confidence interval) of all power generation in the United States will be dedicated to AI training/inference

I am surprised how few people put 1% in their confidence interval. If this whole AI thing doesn't take off, and the labs either fail or people just stop using LLMs as much, then anything above 1% seems excessively high for a lower bound. And I think that seems like a reasonable scenario.

Training Cost

Between X billion and Y billion US dollars (90% confidence interval) will be spent to train a new state-of-the-art AI system

This one again has a bunch of possible things that could impact the cost of training a new model. Maybe we find a much better training algorithm. Maybe the AI bubble pops and we stop spending so much money on it. What I found most interesting in the distribution of answers is that people who believed the costs could become very large (e.g., tens of billions of dollars) tended to have much wider margins of error compared to people who were very confident no training run would reach above a billion dollars.

Transformers are all you need?

There is a X% chance that the best AI system will be recognizable as (a slight modification of) the transformer-based Large Language Models (LLMs) we use in 2024

Personally I think this distribution is one of the more well calibrated ones. Transformers feel like they'll stick around for a while longer, I wouldn't say with 100% probability, but relatively high.

What I found most interesting about this question is how much it flips when you go from 2027 to 2030: all of a sudden people go from almost certain that transformers will be the dominant architecture to highly confident that they won't be. Three years isn't that long of a time for this change to happen in, but then again, transformers are only a handful of years old so maybe it makes sense.

AI Damage

An AI system will cause between X thousand and Y thousand deaths (90% confidence interval), or between X and Y billion dollars (also 90% confidence interval) in damage, within a 3-day period

This question is again really interesting in how it seems like everyone who was willing to assume there could be a million deaths or a trillion dollars in damage was also willing to believe there could be zero deaths or zero dollars in damage.

When people explained why their answers were so low, a lot of people focused on saying something like "AI systems will not be directly plugged into large critical infrastructure in less than 6 years...". I really hope they're right! But I'm not sure I believe it.

Maybe not unsurprisingly, the damage numbers go up quite a bit when you move from 2027 to 2030, but even still a bunch of people basically believe there won't be any catastrophic events caused by AI. Again I really hope they're right.

CBRN Risk

There is a X% chance that an AI system exists that can meaningfully improve the ability of non-expert humans to perform sophisticated cyber attacks, develop biological or nuclear weapons, or otherwise cause severe harm

I found this one surprising to compare to the prior question. The majority of people here seem to think that AI systems won't be able to significantly improve the ability of non-experts to cause harm, but then in the prior question the majority of people had damage numbers above a few billion dollars. So I guess these people believe experts will cause damage with AI systems? I'd be curious to talk to someone who believes this to understand where they're coming from.

Concentration of Power

There is a X% chance that the impact of AI has significantly, and discontinuously, increased the concentration of power or wealth

This is actually the risk I'm most worried about and believe will happen with the highest likelihood. But some people just flat out commented "No way." as their only explanation.

Trust in AI

There is a X% chance that people will regularly ask AI systems answers to questions, or plans to achieve some goal, and even if the answer seems unreasonable, believe it because they assume the AI system "knows best"

I think the discrepancy between 2027 and 2030 is interesting here, too: almost everyone thinks that by 2030 most people will trust AI systems to give them the best answer, but in 2027 it's a lot more split.

AI Fiction

An AI system has 'authored' between X and Y (inclusive, 90% confidence interval) high-quality fiction books that have appeared on the New York Times best seller list

This question is interesting to compare to the one about how many math theorems will be proven for the first time by an AI system: the distributions seem surprisingly similar? I would have expected that people might be optimistic either on the case that they could be useful for doing artsy-writing, or that they could be useful for doing math, but not both.

AI Influencers

The largest AI social media account will have between X% and Y% (90% confidence interval) the views/followers of the largest human on that platform

About a third of people think that by 2030 the most popular influencer will be an AI system. That would be a truly wild world to live in.

Movie Production

There is a X% chance that an AI system could produce a high-quality live-action hour-long movie given a rough script

Interestingly, people seem to think that this is harder than writing fiction. Maybe that's because models are known to be good at text, currently? But I'd say that long-term coherence is the challenging part for both, and it feels like there's less coherence needed to make a good movie than write a book from scratch.

On the other hand, there's probably less economic incentive to make movies with AI than to make an AI system that's fantastic at working with text. So maybe this explains things completely.

Hallucinations

There is a X% chance that state-of-the-art AI systems still regularly "hallucinate" incorrect solutions to problems

I'll highlight two comments here. First, someone who said "This might be the single most important thing for AI to get right.", which I agree with. And second, the person who said "to quote karpathy: "I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines."" who I also agree with. Taken together: seems bad?

Jailbreaks

There is a X% chance that an open-weight state-of-the-art AI system exists that can reliably withstand jailbreaking and "prompt injection" attacks by both human and automated adversaries

Again people are pessimistic about progress in these kinds of hard problems. I tend to agree. (I appreciate the one person whose entire response was the word pliny.)

Pause

There is a X% chance that there has been a 6+ month pause on AI development where companies are prevented from training more capable models

Well this seems like people don't think it will happen, but there are a few people who have put nearly-100% probabilities on it happening by 2030.

There's a whole Pause AI movement, which I guess people are pessimistic about?

Manhattan Project

There is a X% chance that the US or China has created a public AI "Manhattan Project" to pursue AI, AGI, AI Safety, or other AI-related goals

Again look at both 2027 and 2030. It seems like people are very confident that we'll try to do this by 2030, presumably because governments are slow and it takes a while to get things going. On the other hand, if the AGI thing doesn't work out, we almost definitely won't see this happen. So the high degree of confidence in the answer "yes" is surprising to me.

(To clarify: the recent stargate announcement isn't government money, and so does not count.)

Concerns about AI

In a national poll, between X% and Y% of the US population (with a 90% confidence interval) ranks "concerns about AI" as a top societal issue

I'll begin by quoting the person who said "Historically the most important is basically always below 20, so it would have to be a huge crisis to get much higher. Also plausible there are no polls.". I agree; I think getting 20% would be enormous, but it seems like most people actually think this will happen! I'd be curious to hear the reason why people think this will happen, if it's because of job loss or something else.

Useless Questions

Between X and Y (inclusive, 90% confidence interval) of the questions in this forecast are so misguided that their resolution is uninteresting

The final question I asked was what fraction of questions I asked were just bad questions. I'm glad to see the number is relatively small, so maybe I didn't waste everyones' time asking so many. On the other hand some people seem like they think I may have done a truly terrible job with these! I guess that may be possible, who knows.

Also: I appreciate whoever wrote as the comment "Haha".


Concluding Thoughts

This is part 2.5 in a 3-part series I decided to write about "AI". In the first post I wrote about how I'm using these recent models to advance my own research. Then, last time, I asked people to forecast how they think the future of AI will go. This post, as you just saw, was a summary of those predictions. Next time I plan to finish the series by talking about how I think the future of AI will go and what risks I'm worried about.

If you haven't made your own forecasts yet, I honestly believe this is approximately ~the most useful thing you can spend thirty minutes on today. And not because my particular questions are fantastic, but because this field is changing so fast that I think it is extremely important to be willing to humble yourself and make falsifiable predictions. So if you don't like these questions above, I'd encourage you to (publicly!) make as many falsifiable predictions as you can, so that your future self can keep your past self honest.

And with that said, I'll see you next time to talk about my own predictions. (And, in a year, I'll hopefully be able to actually start to resolve some of these questions one way or the other.)