It makes no sense that multiplying two numbers can be faster than adding them.
Addition is simpler. It requires fewer operations. It should logically, fundamentally, always be cheaper. The fact that it sometimes isn’t feels like a glitch in the matrix, a sign that something foundational has shifted under our feet.
“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t So.”
For decades, computer science rested on a small set of assumptions that felt permanent. We believed that time and space trade off in predictable, intuitive ways. We believed that humans write algorithms and other humans review them. We trusted that hardware gets faster along a predictable curve. And we held as gospel that if two algorithms have a clear pecking order on one generation of machines, that asymptotic order will hold on the next.
These were the truths of our discipline, but abstractions are compressed bets about reality, that allow us to ignore microscopic detail and still make rational decisions in the macro world. When the underlying reality shifts, a leaky abstraction can remain culturally familiar long after it stops being the right thing to trust. The map is not the terrain.
In this essay, we’ll peek at five developments, covering mathematical theorems, industrial benchmarks, a landmark episode in human-machine collaboration, and a speculative hardware prototype that multiplies faster than it adds. All five are pulling on the very same thread, showing that the abstractions we inherited as bedrock are starting to look more like era-specific interface contracts that are rapidly expiring.
When computer scientists reason about computation in its elemental form, they use something called a Turing machine. It is the stripped-down, thought-experiment version of a computer: there’s no cache hierarchy, no clever compiler tricks, no pipelining, and no instruction set quirks. Just a read/write head doing work on an infinite tape. This is how we understand what computation itself does, in this theoretical sandbox.
For a long time, we carried an intuition about how time and memory relate on these theoretical machines. If a computation takes a lot of time, simulating it will require proportionally a lot of scratch paper (memory). The trade-off felt stable, almost physical, governed by something like the laws of thermodynamics. Like packing for a trip: the longer you’re going, the bigger the suitcase you inevitably need.
In Simulating Time With Square-Root Space (2025), Ryan Williams proved that every multitape Turing machine running in time t can be simulated in O(sqrt(t log t)) space.
To understand why this is a BFD, let’s unpack the history, because it changes a picture most of us hold without realising it.
This intuition was formalized in 1975 by Hopcroft, Paul, and Valiant. They established the O(t / \log t) bound for simulating time in space. This boundary was a masterpiece of theoretical computer science, relying on intricate pebble games on directed acyclic graphs. In the fifty years since, smart people threw themselves at this bound, trying to compress the required space further, without success.
The 1975 bound felt like an impenetrable wall, and the assumption sat stagnant, ossifying into an accepted truth about the universe of computation.
Along came Williams in 2025, upending a 50-year-old assumption by a massive polynomial factor. Williams showed the suitcase can be dramatically smaller than anyone thought (the relationship follows a square root). That’s the theoretical equivalent of discovering we can pack for a year-long expedition in a carry-on bag.
He also spells out concrete consequences, including bounded-fan-in circuit evaluation and an O(n)-space versus n^(2-epsilon)-time result.
While nobody in the real world builds on Turing machines, the reason to care is that theoretical limits quietly train our metaphors. A massive amount of practical, day-to-day engineering judgment starts with a theoretical story playing in the back of our heads.
One of the oldest, most foundational mental models about what memory buys us, and what time costs, turned out to be merely a map.
That map no longer accurately reflects the terrain… it just ain’t so.
When Herr Doktor Professor Donald Knuth published Claude’s Cycles on 28 February 2026, the flashy headline practically wrote itself. An 88-year-old legend of computer science (who famously doesn’t use email), a stubborn open mathematical problem, a large language model producing an unexpected, elegant construction.
The meme machine was obviously going to have a fantastic time with this narrative.
But the real story is significantly more profound and complicated, than the headline.
Knuth had been working on a combinatorial problem involving certain highly specific cycle structures. Claude (the model) produced a construction for the odd-m case that Knuth had never seen before. Then Knuth did the thing that makes mathematics mathematics: he proved it. He took the model’s alien suggestion and showed step by step, that it worked.
The model generated a promising candidate, the human verified it was real. And the loop ran faster than it ever could have run historically.
But how did it really happen? Claude didn’t just “produce” a construction. Knuth’s colleague, Filip Stappers, had to heavily and aggressively steer Claude through its 31 distinct explorations. This was not a passive request to an omniscient oracle; it was a heavily “cyborg” workflow. Stappers was forcing the model to continuously document its progress in a plan.md file so its amnesic context window wouldn’t collapse under the weight of the search space (or - that damned RAG problem - the “wrong” context poisoning the search).
The human isn’t just the passive filter at the end of the loop, waiting for the machine to dispense wisdom. This is an altogether different, interactive cyborg synthesis: the human provides the executive architectural steering over an amnesic, highly capable search engine, and then the human rigorously verifies the mathematical leaf that the search engine eventually uncovers.
A model can throw out and synthesize candidates exponentially faster than a person can. A human with taste, experience, and standards can design the search, decide which branches are promising, and discard the (apparent) nonsense. Formal tools can then help lock the results down, and other people can pick up the thread, test nearby edge cases, and write up formal extensions.
The historical loop between conjecture, exploration, proof, and formalization just got drastically shorter. And once the loop shortens, the old, romanticized picture of mathematical discovery as “one lone genius, one chalkboard, one unpredictable lightning strike of inspiration” starts looking like a comforting fiction we told ourselves. The real process was always messier and more collaborative, but now the collaborator operates at machine speed.
Software engineering already lives inside a similar pattern: generate architectural options, evaluate their trade-offs, verify their viability, and tighten the deployment loop.
What’s new is seeing this high-speed, iterative search pattern succeed in a part of intellectual life that people still desperately want to imagine as purely individual and artisanal.
We’re not going back to doing maths (or physics, or computer science, or …) the old way.
If Knuth’s workflow represents the human-in-the-loop frontier, then DeepMind’s AlphaEvolve represents the strongest industrial evidence of full automation in this essay. It deserves our attention because DeepMind rigorously paired benchmark wins with heavily reported, massively scaled production deployments in the accompanying technical paper.
AlphaEvolve takes an algorithmic problem, uses a large model (Gemini) to generate thousands of candidate solutions, and evaluates them strictly against a sandboxed verifier. The model explores far more of the search space than any human team has the time for. The verifier clinically decides what compiles and what actually executes faster. The system then mutates, learns, and iterates.
According to DeepMind’s write-up, AlphaEvolve was unleashed on more than 50 established open algorithmic problems. It rediscovered state-of-the-art solutions in roughly 75% of those cases, and astonishingly, it improved upon the best known human solution in 20% of them.
Those are good, solid numbers. But the specific examples are what demand our absolute, undivided attention.
One breakthrough is a 48-scalar-multiplication algorithm for 4x4 complex matrix multiplication. Another is a Borg scheduling heuristic that DeepMind reports has been running silently in production for over a year, recovering an average of 0.7% of Google’s worldwide compute resources. That number, when translated into megawatts and silicon lifecycle costs, is economically staggering. The paper also reports an average 23% kernel speed-up across various operations and roughly a 1% reduction in Gemini’s own training time derived directly from a deployed, AI-discovered heuristic.
Algorithm design is the foundational layer where the highest scientific honors (like the Turing Award) have been granted. Volker Strassen’s matrix multiplication, Peter Shor’s quantum factoring algorithm, Cooley and Tukey’s Fast Fourier Transform (FFT). All generational insights from individual human geniuses staring at a problem long enough to see something that nobody else saw.
Consider Volker Strassen’s legendary 1969 record of 49 multiplications for a 4x4 matrix. Strassen’s algorithm is elegant, and importantly, it works over any commutative ring. AlphaEvolve’s AI-generated 48-multiplication method, however, is profoundly different. It explicitly relies on the complex plane, leveraging non-commutative rings admitting division by 2 and multiplication by the imaginary unit i, to find bizarre, magical tensor cancellations. The AI took a ridiculously counterintuitive, computationally intensive path that human mathematicians, constrained by traditional algebraic elegance, would almost certainly never attempt.
It generated alien maths.
I reckon much of the future equivalent of this highest-honor work won’t come from a lone genius in academia, but from a relentless optimization process running quietly on some kid’s laptop.
The human job moves up a level of abstraction: from writing the algorithm by hand, to building and tuning the system that discovers it. We still need brilliant people to define the objective functions, design the tightest possible evaluators, deeply understand deployment risks, and rigorously decide what “better” actually means in a business context.
But the search itself, will be handed over to machine-heavy brute force guided by learned heuristics. And now we have undeniable industrial evidence that this structural shift can be worth hundreds of millions of dollars.
Now we turn to the metal, because Big-O notation can stay mathematically flawless and simultaneously become economically worthless (or worse: misguided). This happens when the underlying hardware quietly changes what’s cheap and what’s expensive.
For decades, software engineers operated with two massive, reliable tailwinds that made “hardware gets faster” a safe assumption to bet careers on.
Moore’s Law observed that the transistor count on a dense integrated circuit doubled roughly every two years. More transistors equaled more raw compute capability. You could plan software roadmaps around it with total confidence.
Dennard scaling said that as transistors shrunk in physical size, their power density stayed roughly constant. So we got more compute without a proportional rise in heat, so chips got faster, and you can still hold your phone with your bare hands.
Dennard scaling broke down around 2006, and since then, single-thread clock speeds have flatlined. The last 2 decades of performance gains have instead come from specialized hardware: massive multicore processors, GPUs, TPUs, and a wild array of custom ASICs and accelerators.
Plus cą change. We’re returning to an era reminiscent of the 1950s, where the physical wiring of the specific machine fundamentally dictates the viability of the algorithm.
The Simons Institute programme on emerging computing technologies at Berkeley gets right to it: the mathematical models we use to reason about efficient computation are dangerously out of date and desperately need updating.
In the realm of video games, real-time ray tracing was once considered impossibly expensive. Tracing the bounce path of millions of individual photons through a complex 3D scene took vastly more compute than any GPU could ever deliver in 16 milliseconds. So NVIDIA changed the rules, by baking dedicated RT (Ray Tracing) cores directly into the silicon and paired them with AI denoising algorithms that hallucinate the missing data gaps. The result feels like an O(1) oracle from theoretical complexity theory: we can’t actually look up arbitrary physics answers in constant time, but if we build the right physical structure into the hardware, an operation that was once prohibitively exponential becomes functionally instantaneous. We’re even seeing startups bake LLMs into ASICs for previously unimaginable inference speeds.
Or consider the recent breakthroughs in precision analog in-memory computing: the field’s defining goal is to make matrix operations happen precisely where the memory already lives, bypassing the massive energy and time tax required to shuttle data back and forth through the traditional von Neumann bottleneck. Recent Nature Computational Science papers on in-memory attention mechanisms report up to 70,000x lower energy consumption and 100x higher operational speed in experimental settings. Closed-loop in-memory accelerator papers are pushing aggressively on inverse operations and native linear-system solving.
The caveats here matter : this academic literature is full of signal noise, hard precision limits, non-idealities, massive Analog-to-Digital (ADC) overhead, and some profoundly grotesque engineering workarounds. But that’s where most of the actual labor happens (analog is hard 🤷🏾♂️). And the papers are upfront about this; they’re directly concerned with how to live with, and engineer around, those analog problems.
Then there’s the wild frontier of neuromorphic computing that entirely abandons the neat, predictable clock cycles that conventional processors rely upon. They fire asynchronous signals only when a state actually changes, behaving much more like biological neurons than rigid, synchronized spreadsheets. That makes them remarkably energy-efficient for specific, event-driven workloads, and it makes them remarkably awkward to reason about using our usual discrete mathematical tools.
Once the underlying cost model shifts this violently, the statement “this algorithm is asymptotically better” can be mathematically true and yet still be the entirely wrong answer in production. Our Big-O analysis may be perfectly, rigorously correct about an abstract machine that no longer reflects the true economics of the silicon we’re actually running on.
This brings us to the speculative fringe, and directly to the paradox that opened this essay. The nCPU repository is perhaps the weakest empirical evidence cited here, and certainly the strangest. I can’t oversell it, but I can’t unsee it either, so I want to share what’s captured my attention.
Most computers function by executing discrete, sequential instructions: add this register, store that value, jump to this memory address. Machine learning systems operate on an entirely different plane: they learn continuous, differentiable functions through backpropagation and gradient descent. Historically, these have been two completely separate, non-overlapping worlds. nCPU is a radical attempt to forcibly merge them. It claims exact, learned 32-bit arithmetic, a fully differentiable stack starting from basic arithmetic upward, and instead of calculating the answers to arithmetic questions “the old way”, it uses a trained neural network to perform them instead.
Most of us carry a very clean, sanitized boundary in our heads between these two paradigms. nCPU smears that boundary operationally.
Within the nCPU architecture, the author documents a literal “Performance Inversion.” Because nCPU models basic mathematical addition via a sequential carry-propagation network (specifically, a Kogge-Stone adder structure), and models multiplication via an O(1) batched tensor lookup, neural multiplication is literally benchmarked at 12x faster than neural addition.
When the silicon and the software stack are architected explicitly around tensor operations, the fundamental pecking order of arithmetic itself flips. The mathematics of arithmetic hasn’t changed. But the literal economics of the underlying hardware substrate has mutated so profoundly that our oldest, most deeply ingrained abstractions, the idea that adding is always faster and cheaper than multiplying, have simply failed to catch up.
Even if the nCPU project ultimately turns out to be a dead end in practical terms, it’s a useful intellectual probe. It shows us where our hidden assumptions were doing heavy conceptual work that we hadn’t even noticed.
Some ideas are incredibly valuable long before they’re reliable, because they tell us which walls in our mental models were load-bearing, and what happens when you knock them down.
We’re observing five wildly different kinds of pressure, originating from completely different domains of research, all pushing in the same direction:
A formal theorem that reshapes our intuition about time and space bounds.
A profound workflow change that shortens the human-machine loop in pure mathematical discovery.
An industrial AI that automates the search for counterintuitive algorithmic breakthroughs.
A hardware landscape that’s aggressively rewriting the economic cost of basic operations.
A weird, speculative prototype that smears the line between discrete logic execution and continuous learned functions.
The practical takeaway for professionals is the hard work of sorting. Which of these claims are immutable theorems? Which are verified production facts saving millions of dollars? Which are promising but unreplicated experiments? Which are purely cost-model shifts in the silicon? And which are simply strange enough to watch closely while refusing to pretend they’re settled?
That’s the grown-up version of this conversation. We often witness brilliant engineers dismiss this entire wave of change completely just because one item in the news is overly speculative. It’s equally common to witness evangelists use one real, proven theorem as a blank cheque to smuggle in ten wildly unproven, stronger claims.
Mes amis, most of the old abstractions are still useful (no algorithms textbook burnings are called for, whatever salve for your college days that might be). We just need to hold those abstractions with much better, stricter scope conditions, and change how we operate.
First, our background intuitions aren’t immutable laws of nature. The phrase “Everyone knows that time and space trade off like this” is the sort of sentence that should trigger a pause and re-evaluation.
Second, evals (frameworks for formal verification of outputs) in both pure mathematical proof workflows and applied algorithm search are a radically powerful tool for algorithmic development. I haven’t nearly thought of all the places to apply this insight.
Third, we should think about hardware as an economic object first, with its own physical structure that influences what optimisation means. Sometimes a faster chip is just a faster chip, but sometimes it’s a fundamentally different machine that demands an entirely new approach to algorithmic design.
Fourth, speculative prototypes deserve a more disciplined, nuanced kind of attention. We need rigorous labels: this is interesting, this is entirely unproven, this might change the world if replicated, and this specific claim doesn’t get to borrow unearned credibility from stronger, proven items sitting nearby.
Computer science is currently watching a few of its oldest, most reliable interface contracts get renegotiated all at once.
If we build systems for a living, the ultimate thing we’re really choosing is which abstractions we trust, and exactly how far we trust them.
We’ve got to become a lot less willing to mistake “worked efficiently for decades” for “a fundamental law of reality.”
If multiplying two numbers can legitimately become faster than adding them, friends, perhaps we should look very, very carefully at the rest of the things we thought we knew for sure.
About the Author
Sutha Kamal is a Canadian technologist and builder with nearly 30 years of experience operating at the frontier of emergent systems. He has led product at billion-dollar companies, founded venture-backed startups like Massive Health, and currently builds AI platforms from the ground up as Head of AI at Kimono.
Based in the San Francisco Bay Area, Sutha goes deep across multiple disciplines to build the infrastructure mapping our new computational reality.







