The era of models is over, we are in the era of harnesses

It is a bit silly to me that people compare the different AI models (i.e. Gemini Pro vs Opus vs 5.4 etc) and say “X is better” or “Y is better”.

The reality is that all of these models are probably all at the same rough level, but subscriptions users don’t have access to the raw version of these models, we only have access to the harnessed versions of them.

The game that all the big AI companies are playing is to have their flagship model that is expensive to run, and make the subscriptions plans that are supposed to give access to the flagship model actually give access to a harnessed version that tries to match the prompt with the most efficient level of intelligence needed.

In other words, intelligence scales with compute and compute costs money, so all big companies are labelling their end products as the flagship models, but actually delivering lower tier models via model cascading or routing layers so they use less compute and cost less to run.

Both GPT and Gemini web apps are the most obvious example. For Gemini, notice how the models are “fast, thinking, pro”? These are not models, they are harnesses that directs the question to the cheapest model to sufficiently solve it. Yes, there is a relative scale level of intelligence based on these harnesses, but no it is not the same as the model performance card they so proudly present every release.

All AI companies are doing this. OpenAI started with Chatgpt 5. Google’s last non harnessed model was probably 2.5 Pro. Anthropic is a bit late to the table but probably started it en masse around March 8 2026 where claude users have been complaining there has been a noticeable degradation in problem solving.

Business strategy wise, model access is a commodity business with near zero switching costs. What this means is that there is two levers companies can pull, perceived quality of goods (intelligence) and cost (compute). And there is a direct relationship between the two which dictates the most important problem of making money in the AI ecosystem—efficient intelligence, not raw intelligence.

Winning is not just having the smartest model because it will also be the most expensive to run. Yes, having the smartest model will net you the most users, but you will lose money on each user because your model is so freaking expensive to run relative to what users want to pay. So you have to implement hard limits (I can only talk to Opus 4 times per week) or just lose money while gaining user share, but those users will leave you as soon as they get a cheaper alternative because of the near zero switching costs.

These are market forces, so no AI company is above this. They cannot give out free intelligence to all forever because intelligence is scaled with compute, and they cannot give free compute to all, because compute costs money, and they are not in the business of giving free money to all.

It seems after a few years of AI wars that the profit making strategy, which all companies will eventually have to head towards, is creating the perception of having the smartest model, while under the hood lowering costs (compute) as much as possible to fit the consumer price point appetite, which seems to be $20/month.

So the era of subscription based model access is soon to be dead. Subscriptions now give model harness access, which all the companies are spending a ton of effort to push the intelligence-to-compute efficiency ratio so they can make money off the price point that consumer users have settled on.

Don’t like your harness? Want the actual model? There’s an easy way to do that - just use the API and eat the API cost.

But before you lament too much on the loss of GPT 4o, 2.5 pro, or Opus 4.6 before the nerfs, intelligence is getting exponentially cheaper by the week, which is super awesome for consumers. Smaller models are growing in intelligence at a frightening rate, and while using smaller models may not allow you access to cutting edge AI intelligence, we are about 6-12 months out from a small model that can run on your laptop as your daily AI driver. That’s crazy.

These smaller models are probably going to be the ones we use the most, not because they are super intelligent but because they are free, and probably smart enough.

The market for model use seems to be segmenting itself: the cheap daily driver which will be small local models, subscriptions to model harnesses for cheap super intelligence, and direct SOTA model access for cutting edge super intelligence.

On the cheap models front, yes Chinese AI companies have been doing a great job, but the sleeper in the room is Google, which has AI answers on practically every search result. They are probably leading the pack on cheap models given that they run AI at such a large scale without breaking the compute bank. Also, Gemma 4.

For model harnesses, it is hard to know who is doing the best because while users can feel the engineers tinkering with the model intelligence, we don’t have access to their internal compute costs per answer. I would bet that ChatGPT is forced to drive costs down a lot as their scaling seems to have hit a wall and they need to fight for a sustainable future. But Google might just be the strongest positioned, with their TPUs and cost effective Flash model line.

For SOTA model access, Anthropic leads the pack, not only because they have the best model but because they’ve signaled commitment to SOTA model access the longest. You know you’re not getting degraded performance and you also know you’re paying the highest prices for tokens out there. But users have recently experienced intelligence degradation, which may imply that SOTA model access might not be a large sustainable business model at the current price point / market demand. At the very least, it implies that in order to continue to hit their growth targets, Anthropic is forced to reach into the pro consumer market for more subscriptions.

But it is good to be reminded that the biggest winner of the AI wars is the users. We got good years of subsidized SOTA model intelligence, and we are also going to get super capable $20/month models for the foreseeable future.

Missing out on SOTA intelligence for a generation or two will be plenty fine once good enough models are actually good enough, and that time is pretty much almost here.