People who are really serious about software should make their own hardware. — Alan Kay
The economics of frontier AI are already visibly distorted. Major LLM providers are selling access to scarce, capital-intensive infrastructure via subscriptions that seem affordable to end users. I understood that in theory. But running a model locally made the distortion concrete.
My setup is a MacBook Pro M5 Max with 48 GB of RAM and 40 GPU cores. The model I can possibly run is Gemma 4 (31B parameters, Q8). To deliver a single-feature, end-to-end SaaS product of mediocre quality, scoring around 6/10, the machine had to work uninterrupted for 3.5 hours.
By comparison, state-of-the-art models such as Opus 4.7 and GPT-5.5 deliver the same scope at close to 9/10 quality in about 20 minutes.
But the point is not to compare a local model with a paid SOTA model. That would be a mistaken comparison. No one expects a 31B model running on a laptop to approach the quality of a frontier system operated on specialized infrastructure.
The point is economic.
The MacBook costs around $5,000, while a Claude Max 20x subscription costs $200 per month. The value of the hardware is equivalent to approximately two years of subscription.
On that basis, amortizing the $5,000 over 24 months and assuming continuous 24-hour-a-day usage, the 3.5 hours of local processing cost approximately $1.
That cost produced a mediocre result, with quality around 6/10, and an execution time incompatible with any real productive workflow.
Meanwhile, a $200-per-month subscription, used for 8 hours per day due to recent quota limits, costs about $0.83 per hour. If a SOTA model delivers the same scope in 20 minutes, the cost is approximately $0.28, with quality close to 9/10.
That is the central anomaly: for the end user, the frontier service is cheaper, faster, and much better than the local alternative, even when that alternative runs on expensive, modern hardware.
This does not prove that the real marginal cost for these companies is necessarily higher than the subscription price. But it does indicate that the price charged to the user is disconnected from the operational complexity required to deliver that result. The subscription turns an extremely expensive, scarce, capital-intensive infrastructure into an apparently cheap flat fee.
That’s where the calculations start to look artificial and alarming - OpenAI and Anthropic are charging a third of the cost tops.
The inevitable question is: will there be a contraction in capacity? Will these companies keep burning capital indefinitely?
As long as there is real competition — with China releasing open-source models and multiple players competing for market share — prices tend to remain artificially low. These providers do not have full freedom to pass the full cost of operations on to consumers because users can migrate to alternatives that are good enough and cheaper.
We have already observed what happens when a marginal advantage appears. Anthropic had only a small competitive edge over the past year with Claude Code and already adopted restrictive measures: it limited Opus usage, introduced daily and weekly quotas, and increased effective token consumption. It took only a minimum level of competitive confidence to begin that movement.
With six months of real advantage, price increases of 3x to 5x would be plausible. What costs $200 today could become $600 or $1,000 — and still be worth it for many intensive users. After all, few would go back to manual programming once they’ve incorporated that level of productivity into their workflow.
But there is a structural paradox. When one company takes the lead, it needs to raise prices to cover the cost of the technological frontier. When it raises prices, part of the user base migrates to the second or third-place provider, which is still operating below the ideal economic price. That second player gains traction, market share, visibility, and access to capital. With that, it invests heavily to reach the frontier. Once it gets there, it faces the same problem: it needs to raise prices to sustain the operation. The cycle begins again.
This loop prevents sustainable profit as long as the main competitive differentiator is frontier computing capacity.
Top-tier hardware does not become cheap the moment it becomes a commodity; it simply stops being the frontier. The frontier moves to the next generation of hardware, once again scarce, expensive, and geopolitical.
As long as hardware is the central bottleneck of competition, the cost of operation will remain prohibitive for every relevant player.
The discovery of LLMs is equivalent to realizing that there is juice inside the orange.
The difference between the available models — Claude, GPT, Gemini, Llama, DeepSeek, and others — is increasingly close to the difference between tools for extracting that juice: a better squeezer, an attached saw, a pre-perforation, a more efficient pressure.
There is also an improvement in cultivation methods. Larger, sweeter oranges are produced for specific purposes: one for desserts, another for juice, another more citrusy, another less acidic.
This corresponds to training: dataset curation, refinement of the raw material, and specialization of the data that feeds the model.
But the orange is the same. You will hardly be able to distinguish the orange, or the method used to extract its juice, from the product sold on the shelf.
The fundamental discovery has already been made. What remains, to a large extent, is optimization. And optimization, however sophisticated it may be, ceases to be a pure scientific frontier and becomes engineering. Engineering tends toward standardization. And what becomes standardized eventually becomes commoditized.
Given this scenario, I offer a prediction: OpenAI, Anthropic, and similar companies will increasingly serve as operators of data centers and processing facilities, potentially losing up to 90% of their current market value.
The AI model will migrate into the physical domain. AI will no longer be primarily an abstract logical layer, loaded into memory and executed by software. Instead, it will become part of the computational infrastructure itself.
The current direction of these companies already points to this movement. OpenAI and Anthropic are converging toward companion products: programming assistants, operational productivity tools, image and video generation software, everyday-use interfaces, and conversational agents. This is the typical behavior of companies that turn a discovery into a product, because the research frontier no longer provides differentiators proportional to the capital invested.
In the coming decades, the frontier will no longer be the model loaded into RAM, executed by a software layer, and interfaced through the computer as it is today. The leap will be physical: transforming the model’s logic into chip architecture.
Instead of a file of weights — the LLM — being loaded, moved, and processed by a generalist software stack, central parts of inference will be incorporated directly into the hardware’s logic circuits: transistors, logic gates, dedicated arrays, memory close to the processor, and electrical pathways specifically designed to execute AI operations.
In this scenario, the model ceases to be merely software running on silicon.
Software becomes the silicon itself.
The advantage will not be only speed. It will be cost, latency, energy efficiency, and scale. Processing will occur much closer to the physical limits of signal propagation inside the chip — that is, the speed at which electricity and light travel through circuits — instead of continuously depending on data movement between memory, processor, and software layers.
NVIDIA will begin selling factory-integrated models: not merely chips that run AI, but chips whose architecture already incorporates part of the AI’s logic into its physical structure. This does not eliminate all software, but it drastically reduces dependence on the generalist logical stack that today makes everything more complex, slower, and more expensive.
When this happens, LLM scientists will migrate to hardware companies as the scientific frontier shifts toward them.
What will remain for OpenAI and Anthropic is to reposition themselves on two fronts: as makers of harnesses — orchestration, interface, and packaging layers, equivalent to the Word and Excel of previous decades — and as data center operators distributing this processing through cloud services.
Microsoft and analogous companies will follow a similar trajectory, becoming increasingly infrastructure companies. Windows and the Office Suite will lose relevance in an environment where the user simply expresses what they want and the system executes it — without windows, menus, tabs, or intermediary applications. The interface stops being primarily visual and becomes intentional.
The absorption of OpenAI by Azure is a likely outcome, with both operating as providers of processing services. Microsoft, in turn, could also lose up to 90% of its current market value.
The protagonist of the coming decades will be hardware, not software.
The reason is simple: software tends to become logic embedded into circuits, unbeatable in speed and cost. You request; the hardware executes. No complex interface. No intermediary design. No unnecessary friction.
It will be up to the developer ecosystem to build interfaces on top of the cloud processing provided by these hardware systems. One person will create a specialized voice assistant. Another will integrate conversational agents into messaging platforms. Another will launch a social network based on autonomous agents.
Creativity decentralizes. Infrastructure centralizes.
It is worth clarifying a common confusion: Claude Code does not represent a technological frontier. Any competent engineering team can develop an equivalent system in a short time. The examples are abundant: Pi, OpenCode, and Claude Code itself, whose code recently leaked. Open-source software has become sexy and enjoyable again; that combination is unbeatable against companies that provide proprietary, closed solutions.
All software that generates wealth will be copied in two weeks.
I use Pi to run open-source models with quality comparable to Claude Code. Building harnesses is not frontier work. It is a commodity in formation.
NVIDIA, TSMC, AMD, Intel, Huawei, ASML, and similar companies define the future of computing.
That is the central thesis: IBM was right. Jobs was right. Gates was wrong.
The era of software as the protagonist has come to an end. Software becomes engineering. Engineering becomes a commodity. Commodity becomes market price. And market price does not sustain trillion-dollar valuations.
What remains of real value is the physical barrier: silicon, semiconductor fabs, the lithography supply chain, electrons.
Everything that cannot be copied in two weeks by a competent team.
Everything that requires decades of capital, applied science, and geopolitics to build.
