Settings

Theme

The $15,000 AI Bill. Your $20 Subscription is a DELUSION [video]

youtube.com

27 points by Vasniktel 15 hours ago · 58 comments

Reader

cranium 14 hours ago

I don't understand why simonw's comment is dead, because he mentions a real counterpoint to the video: API token prices are NOT the raw costs for any provider. I'd even say that inference needs to have quite a juicy margin to cover for all the other costs. It would make no business sense to sell API tokens at a loss: nobody knows yet how to price intelligence, so why start in the red when it's the only source of revenue?

It's a different story for subscriptions. According to my rough computation (N=1), a Claude Max 20x at $200 gives you access to around $8k worth of tokens per month – but they don't cost Anthropic $8k! – and there I think they'd make a loss on every token maxxer which may or may not be compensated by subscriptions that are not used. But that's not the end of the subscription story.

Once you are "enterprise" you pay for token use and there is no way around it: Anthropic does it and so does OpenAI. The subscription is the gateway drug to token maxxing. When people are hired in an Enterprise job, they'll come with their habit of using AI for all and any task.

All to say that: yes, AI labs are bleeding money but on everything else – datacenters, training models, talent,...

  • lelanthran 12 hours ago

    > It would make no business sense to sell API tokens at a loss: nobody knows yet how to price intelligence, so why start in the red when it's the only source of revenue?

    All VC funded businesses start selling in the red. What makes you think these ones are going to be different?

  • simonw 12 hours ago

    Huh, I had not realized my comment was [dead] - it's here https://news.ycombinator.com/item?id=48492313 (I see it as not-dead, which I guess is how the dead system works) (UPDATE: it's no longer dead)

    I was calling out the video for starting with:

    > If you paid for that usage through a standard API, those 10 billion tokens would cost you around $15,000 a year. That is the real unsubsidized price. No discounts, no incentives, just the raw compute costs.

    When "raw compute costs" is entirely misleading to describe API pricing.

  • beAbU 12 hours ago

    > It would make no business sense to sell API tokens at a loss: nobody knows yet how to price intelligence, so why start in the red when it's the only source of revenue?

    The only consistent bullet point in the playbook of venture capital-driven startups for the last 30 years is to sell $1 bills for $0.50 and then hope to make it up in volume.

  • IceDane 11 hours ago

    > According to my rough computation (N=1), a Claude Max 20x at $200 gives you access to around $8k

    According to my own personal `cc-usage` script, I'm just about to hit $15k in the past 30 days, and that's about half 5x and half 20x. And I'm not someone running openclaw or letting my agents spin around 24/7 - this is just very active agentic coding, where I'm constantly involved.

CuriouslyC 14 hours ago

Models like Cursor's Composer 2.5 show that you can get real work done without the crazy costs just by focusing on a domain. AGI is silly in part because models are spiky, in addition to making the model more expensive for all queries, you can't easily tell a priori what the model will be good at. The smaller focused model is cheaper to run and if you try to ask a coding question to a biology/chemistry model (or vice versa) it's user error rather than ignorance of the underlying training data distribution.

  • ACCount37 14 hours ago

    "Focusing on a domain" has a hard ceiling.

    A model's capability is a function of model size, and you can only push a small overspecialized "idiot savant" model so far before its crippling size starts to bite you.

    You can make a model like Composer 2.5. But Mythos 5 will beat it on capability, both at coding and at everything else. And the world is always hungry for more capabilities.

    If you're running high on agentic AI and low on human oversight, paying x2 for going from 5% faults to 2% faults is a good deal.

    • jermaustin1 13 hours ago

      I'm not a very smart person, so take what I say with a grain of salt.

      I think the path forward will have agents that use models that are individually specialized tasks (some might use a bigger model, some might use smaller models), then orchestrators that are good at knowing when to use which agent type.

      I've played around with this in my own tiny coding agents, for TTRPG NPCs, and even a small experiment where LLMs controlled a MUD client as an NPC that played the game with you (only 5 rooms in the experiment).

      Basically, break the tasks down into chunks so you don't have to use generalist models for everything, and can chose the right model for the job.

      I'm also running all of this locally, where a generalist foundation model doesn't work, and heavily quantized models don't perform well for all tasks, so for unlimited token budgets, my solution is probably overkill.

      • ACCount37 13 hours ago

        "Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache.

        But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action.

    • anthonypasq 13 hours ago

      Mythos is 20x more expensive though

      • ACCount37 13 hours ago

        Fable 5 is listed at merely x2 of Opus 4.8 on OpenRouter. $10/$50 per 1M I/O, vs $5/$25.

        Now, Fable 5 is currently borderline unusable because of asinine filters. But I assume they'll fix this shit eventually.

tonymelony 15 hours ago

This rumor is not demonstrably true. The subscription prices are competitive and for heavy users even cheap compared to API rates, but there is no evidence that they are structurally priced below cost.

  • ktzar 14 hours ago

    A good way to think about it is finding how much it'd cost to buy and run a GPU that runs a model at around 100tk/s ("thinking" agents are not viable otherwise).

    The figure mentioned in the video is not far off

  • KumaBear 14 hours ago

    Wait for the price fixing that will eventually come after the horse race. Just like internet, phone, tv, etc. The prices are universally increased in tandem.

  • dataflow 15 hours ago

    Are you including capex when you say "cost"? Or are you just looking at inference costs?

    • simonw 14 hours ago

      It doesn't make sense to include the capex cost to train a model in this kind of discussion, because that cost is fixed.

      Consider a model that costs $100m to train.

      If the vendor then prices it such that each inference token has a margin of 10% over the variable costs to serve (power + server costs), whether or not they cover their costs is based entirely on how many tokens they can sell.

      If they sell less than $1bn of tokens, they lose money - the break even point is 10x100m = $1bn.

      If they sell $10bn of tokens they make a ton of money.

      This also means you can't credibly calculate how much of the fixed training expense is covered by your token spend, because until the model is retired and you can account for how much inference it ran you don't know what percentage of the training cost each sold token was responsible for.

      • vb-8448 14 hours ago

        Cost is fixed if you train a model once in several years, if you have to train 3/4 times per year to stay competitive training cost is a thing.

        You have to include also failed training sessions and experiments in the math.

        There are no official figures but given how fast new models are rolled out, I wouldn't be surprised if neither Anthropic nor OAI manage to cover the full models cost.

      • frotaur 14 hours ago

        I think the capex being fixed assumes you can just stop training the next model. But its not clear that you can afford to do that and keep selling tokens.

        And if capabilities plateau such that training the next one is useless, then the margins will drop fast due to competition.

        • ACCount37 14 hours ago

          Model inference:training compute for frontier models is estimated to be over 10:1 now.

          Driven mostly by just how much inference they sell nowadays - but also by things like base model reuse.

  • Someone1234 14 hours ago

    > This rumor is not demonstrably true.

    OpenAI, Anthropic, and Microsoft/Meta/Google are all at a net negative on AI (i.e. they're "demonstrably" losing money). So it is objectively true. If everyone is losing money, and nobody is profitable, then it is a demonstrable fact.

    As far as I know, the only "AI" venture currently in the green is Nvidia, and they're selling shovels to gold miners.

    • BosunoB 14 hours ago

      They are losing money because they are training new models and building new data centers. The claim of the video is that they're losing money just serving current AI models. There's just no evidence of that.

      • Someone1234 13 hours ago

        > They are losing money because they are training new models and building new data centers.

        Neither of which ever goes away. These aren't short term costs, they're the costs of running their business, and it isn't profitable.

        > The claim of the video is that they're losing money just serving current AI models.

        Which is true. Every one is losing money, none are profitable. They're losing money serving current AI models.

        > There's just no evidence of that.

        Their own profit/loss statements are "evidence of that." According to these companies themselves, they're at a net loss every quarter. So it isn't clear what more "evidence" people need or expect.

  • moralestapia 14 hours ago

    It is demonstrably true.

    Grab gpt-oss-120b, run it continuously and see how far 20 dollars worth of that gets you. People definitely use much more than that in a month, not just power users but regular ones, and they're using models that are more expensive to run (plus the "cloud" markup).

    • anthonypasq 14 hours ago

      i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan.

      heres some napkin math

      gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.

      1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)

      2. you are getting 75% prompt cache reads

      Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)

      Input: ~633mil

      ~475 mil cached at 50% input pricing = ~$9.25

      ~158 mil uncached = ~$6.15

      tokensOutput: 25mil tokens ($4.5)

      This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.

      its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.

      • moralestapia 13 hours ago

        I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

        I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

        • anthonypasq 11 hours ago

          heres the creator of opencode explaining how you are wrong

          https://youtu.be/1VqKUrxR2C8?si=uOAs_4XNXtTyTwCP&t=2195

          • moralestapia 10 hours ago

            He's either incompetent or lying.

            An H100 today costs $2.95 an hour on vast.ai[1], which is already a good deal.

            gpt-oss-120b on an H100 gives you ~200-250 tokens per second. I will be generous and say you can get a million tokens an hour out of it.

            OpenCode Go (which I gladly pay for, because of this in part) is $10 a month, that's three hours of H100 use, and the models you have there are more expensive than gpt-oss-120b. Sure, they have "scale" (although that doesn't apply to AI inference, but whatever) and this and that, they're still pricing it 20-30x below their minimum threshold of capital expense.

            Apples to apples, GLM 5.1 they sell it to you at $4.40 per million tokens, at ~50 tps in an H100 (being generous) it costs ~$16 to do a million tokens.

            The math is simple and clear, they lose money.

            1: https://vast.ai/pricing

  • 6stringmerc 14 hours ago

    Okay then provide a link to a Dropbox PDF or official documentation “demonstrating” the premise is “untrue” please. Or admit you’re blinded by faith. Or financially interested in the public believing in a hypothetical like your second sentence.

    In short, citation needed or shens bruh.

    • ACCount37 14 hours ago

      If you're claiming "AI inference is sold at a loss", it's on you to prove it.

      All we have actual evidence of is: some users use enough AI that the subscription is sold at a loss to them (up to degenerate cases: usage maxed out at all times), if billed by API metrics, while some other users are, by the same metrics, profitable (down to degenerate cases: a forgotten subscription with $20 a month and 0 usage).

      We don't know how API prices relate to costs - we only have estimates. And we certainly don't know how much inference does an average subscription user spend.

      If you have some sort of information that would decisively prove that the aggregate is "AI company N is losing money on subscriptions", then, show it.

      Or is it you who's blinded by faith? Like some sort of AI bubble cultist? The bubble is real, you just have to believe in it?

      • BosunoB 14 hours ago

        Very well said. People are making a lot of claims when very little knowledge of the financials is public. If you actually look at the numbers, there are plenty of ways in which API revenue and forgotten subs could more than make the difference for power users. Even if power users are getting 10-20x their sub fee in tokens, the math could still work out. Personally, I doubt more than 5% of Claude subs even approach max usage, because it requires having so many agents running all of the time.

        I imagine we'll know in a few months when these companies go public.

draygonia 15 hours ago

So the choice becomes to either 1) Use the AI tools as much as you can before they increase prices/tighten usage or 2) Stop using them so you won't be compelled to pay more later when the price inevitably goes up/your regular plan gets downgraded?

I would think companies would set the usage low and increase it with capacity rather than subsidizing the power users and going into the red. Maybe my strategy wouldn't be aggressive enough to capture the market, which I'm sure the major AI companies are trying to do.

  • UncleOxidant 14 hours ago

    or 3) start running local models that are competent for most tasks now like Qwen3.6-27B and only use frontier models when the local model gets stuck.

    • Someone1234 14 hours ago

      Many of us would love to, and the models are there, but we're constrained by heavily inflated hardware costs.

      If big AI does crash out, it would be an absolute gold-mine for local LLM. Cheap, efficient, Nvidia GPUs, and RAM that can run the best local models already available, will be a real boon.

      PS - And as great as Qwen3.6-27B is, how large you can scale it (i.e. how big of a context/project) is mostly hardware constrained.

  • j45 13 hours ago

    There's another option too:

    In the short term, resource management can affect prices and allocation, especially when it's being figured out on the go.

    A permanent position that technology is fixed assumes the technology will not improve.

    This means, the software won't get more efficient with it's use of hardware, or the hardware won't become more power efficient, etc. Open/self-hosted models are a real world example where efficiency is happening.

    Thinking technology won't become more efficient is like imagining that cell phones will still run with the poor battery life of the 1990's.

  • spwa4 14 hours ago

    Or 3) a company builds a token producing machine and uses open models that are competitive.

TrackerFF 14 hours ago

I'm not yet dependent enough on any AI to shell out anything more than $15-$25/month.

If I lose it, it will not be the end of the world. I'll probably start digging into local models.

I suspect there are many like me. Far more than there are totally dependent users. I also suspect that the AI economy is some sort of "whale economy", where a minority is footing the bill, by paying outrageous amounts to Anthropic/Open AI/Google.

  • cyanydeez 14 hours ago

    i just hooked up to local LLMs. feels much slower, more controlable and doesnt change unless i choose.

    if i were in business, the idea that my employees would lose skills and be dependent on a third party that controls both price and quality with zero feedback would be insane.

mpeg 14 hours ago

This is true for a lot of subscriptions though: some people use 100% of their mobile plan data, some people use 10% – the price accounts for this.

  • dannersy 14 hours ago

    Uhhh, at this order of magnitude? No way.

    • Matticus_Rex 14 hours ago

      You don't know the actual margins -- you only know the API rate. If their API rate has huge margins and the average subscriber isn't coming anywhere close to their limits, the subscription can be very profitable. If they're only near peak capacity in peak working hours (when API traffic is most active) and subscription 5h limits help them redistribute a lot of use outside peak hours when they've got spare capacity, that alone could make a massive difference in profitability.

    • mpeg 14 hours ago

      I think there are unknowns in the calculation:

      - What are the margins of Anthropic over their API pricing? Without this, all we're saying is the API is more expensive for heavy usage

      - How have their price margins changed over time? I imagine built into their commercial model is the expectation for inference to get cheaper over time

      - They have tighter usage windows than they used to, now having both 5-hour and weekly limits, they also seem to experiment with their usage quite often, this probably affects user's average utilisation, do they have any other levers they can pull here? eg how does changing to an 8-hour window affect it, or limiting certain models to API-only usage based on capacity like Fable

      I know there is a certain level of subsidised usage built into their subscriptions, the VC-funded company playbook, but I don't think anyone from the outside knows for sure how much it is and I imagine it's lower than most people think, and reducing over time.

jassyr 14 hours ago

Frankly these tools should be priced per query. One doesn't need mythos to get a french toast recipe or add a few numbers together, but the flat rate subscription rate hides the inefficiency. Maybe a routing filter at the beginning of the query that chooses the right model for the query?

bethekidyouwant 14 hours ago

Is it ironic that people are watching AI videos and taking them as gospel on AI topics?

bilater 14 hours ago

I’m so sick of this scarcity mindset.

We will have better and cheaper intelligence in the future than we have now. This is not Uber. Inference is profitable for these companies. Looking at API pricing and assuming that reflects the cost basis is dumb.

It costs less for OpenAI to serve GPT-5.5 than it did to serve GPT-4. An H100 is more valuable today than it was five years ago because it can serve more intelligence per token.

Jevons paradox and short-term crunches may cause some swings, but the value of a token keeps increasing while the average token price decreases.

Chinese models are already a fraction of the cost, and we will have a mythos/fable-level open-source model by the end of the year. There is no “gotcha” where every AI company rugs you in unison.

Stop trying to figure out how this screws you. Start figuring out what cool shit you can build with it.

mystraline 14 hours ago

The underlying capitalist problem is that dumping is not just permitted, but expected as a business strategy. Except dumping is usually exporting to another country to kill their industry.

Instead, this dumping is exporting "thinking" to destroy humans' innate thoughts, get them hooked, then rugpull for 3x the cost. Cause just over 1 year of LLMs, takes a developer who could reverse engineer a thing, to now needing help to construct a for loop.

Thats why I run my own LLMs. Hard to rugpull what you own and control. And thats also why I focus on questions not of "do this", but "explain this". I seek to use LLMs to learn more effectively, so I end up needing it less and less.

  • Matticus_Rex 14 hours ago

    Generating huge consumer surpluses as a business strategy? Awesome if true.

    • dag100 14 hours ago

      Err, yes, until the surplus kills off all other competition and allows the supplier to jack prices up sky high, or otherwise bend consumers to their will. There's a reason most countries will stop foreign firms from doing this to them.

      • Matticus_Rex 11 hours ago

        Except there's lots of competition for creating that surplus, including from open source locally-hosted LLMs, and while it's behind the frontier it's not that far behind the frontier.

        The dumping -> non-competitive price increases playbook is historically very, very rare, and relies on a monopoly (or in a few cases oligopoly) with large externally-enforced barriers to entry. The oligopoly case is highly unstable and doesn't last, and besides we don't have notable barriers to entry; we have both market competitors and locally-hosted imperfect substitute goods.

        There's essentially no reason to believe the dynamic you're predicting could succeed here, because we lack all the conditions that make it more likely to succeed, and it's very rare anyway.

      • mystraline 12 hours ago

        In this case, its not competition in the form of foreign company.

        Its competition of engineers, scientists, and intellectual labor being atrophied due to overuse of LLM.

        Pushing costs at 1/100 for 'thinking' gets intellectual labor people hooked and dependent. Then when costs go 300x, leaves people dumber and less capable of doing things on their own.

        LLM companies, by not accurately charging for services, are directly dumping on world-level society and devaluing and addicting people to outsource thinking. Thats the problem.

        In reality all subsidized and 'free' services do exactly this. LLM token vendors are making a play against human thought.

spwa4 14 hours ago

If only every government had competition departments who had essentially ONE job: prevent companies from getting away with this ...

Oh wait.

  • smallmancontrov 14 hours ago

    The Reagan / Bork "Consumer Welfare Standard" intentionally crippled anti-trust 40 years ago. By legislating robber-baron talking points, it succeeded in transforming the US business sector into what you see today: big moats, high profits, low competition.

    The good news is that, after 40 years of Democrats not prioritizing opposition to the Reagan/Bork CWS, the issue is back on the ballots. Lina Khan picked up the torch that Louis Brandeis picked up a century ago (the arguments are identical, in this aspect time is a circle). Unfortunately, her team lost the last election, but just remember for the next one: this issue is now on the ballots.

    • spwa4 13 hours ago

      Okay ... and for all of Europe?

      Also this does not explain why, for example, the New York city government didn't stop them in New York. After all, they sold licenses ("taxi medaillons"), which came with the explicit government promise their position would be protected against competition. That was the deal. In fact, a lot of city governments did this.

      They just abandoned it (without, of course, giving anyone their money back, but it's not like that would have helped)

simonw 14 hours ago

This is not a credible presentation:

> If you paid for that usage through a standard API, those 10 billion tokens would cost you around $15,000 a year. That is the real unsubsidized price. No discounts, no incentives, just the raw compute costs.

The standard API pricing is not the raw compute cost. Making that claim in the first minute of the video discredits the entire thing.

(Here's the full MacWhisper transcript: https://gist.github.com/simonw/991dde81b95fa4436f46517c3c1a4... )

And yeah, if you work your way through the whole thing it's mostly breathless hype based around shaky premises.

empath75 14 hours ago

The Uber analogy is not a good one. Uber is constrained by the cost of cars, fuel and drivers, which do not go down over time. That's it's floor. It can't charge less than that and be profitable. The cost of chips and inferencing _will_ go down over time as new chips come out and the software gets more efficient.

joshstrange 12 hours ago

Are frontier labs providing subscriptions at discount vs API token costs? Yes

However this entire video is slop. I don't know if it's actually AI slop but it's intellectual slop for sure.

Just the title alone is 100% disqualifying. They are using a monthly cost compared to the yearly [0] cost and also using the API token cost as the actual cost to the providers (which it's not, the actual cost is lower, the APIs, from everything we know, are _not_ being operated at a loss).

This whole video is a waste of your time. It's true that we are probably in Uber-phase of LLMs however a massive difference this time around is local inference (and other "open" models). If (US) frontier labs raise their prices then people can reach for the open models locally or using other cloud inference. And all of this assumes a moat, which so far doesn't seem to exist. The open models trail SOTA but not by more than a year or so (I've heard 6mo thrown around a lot).

Either the SOTA models will be priced too high to be worth it and we will move to open/lower-cost (local or otherwise) models or they will continue to provide a benefit over then lower/free models and be worth paying for.

[0] With _zero_ evidence to back up that number

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection