One of the maintainers of Claude Code said something extremely funny on the internet.

I’m not here to dunk on this person who is clearly earnest and well-intentioned. I have thought and said far worse, far more confidently.
But…how silly is it? Clearly, comparing Claude Code to Grand Theft Auto 6 is not fair. The author did specify a small game engine, and even the most brittle, sun-bleached of straw men would concede that that’s not what they meant.
But as I walked down the hierarchy of modern games, nothing felt like a fair comparison. Modern games are just too complex to be intelligibly compared to a TUI. There must be a game which makes this comparison of Claude Code to a game engine be more correct.
how about super mario 64?

Super Mario 64. A classic in 1996 as much as it is in 2026, and a staple of the speedrunning community thanks to its tight controls, just-right number of glitches and exploits, timeless level design. And, more relevant to our purposes, it ran on a fucking N64. That’s right, baby. Check out these specs:
| Component | Spec |
|---|---|
| CPU (VR4300) | 93.75 MHz, ~125 MIPS |
| RSP (geometry) | 62.5 MHz, 8-lane SIMD (8x16-bit), ~500M fixed-point ops/sec theoretical peak |
| RDP (rasterizer) | 62.5 MHz, ~30M pixels/sec fill rate (with Z-buffer) |
| RAM | 4 MB unified RDRAM, ~562 MB/s peak bandwidth, ~640ns latency |
| Texture cache (TMEM) | 4 KB |
| Framebuffer | 320x240x16-bit color + 16-bit Z = ~300 KB |
The worlds within the paintings in Peach’s Castle were bouncy, lit with life and color, and all on these meager specs. This felt like a fairer shake. Thus, our question.
Is Claude Code doing more work in producing and rendering one of its diffed frames than Super Mario 64 did to produce and render a frame of its gorgeous-but-primitive 3D world? If so, what the heck is it doing?
apples to ubermensch
Let’s get a few things out of the way. This comparison isn’t as much apples-to-apples as it is apples-to-theoretical-space-marine-ubermensch. In other words, not even the same species. The hardware these two programs run on could not be less alike. One kind of hardware can emulate the other, in software, like a gape-jawed matryoshka doll.
And yet, von Neumann sired them both, so we can make some ill advised comparison. More importantly, it feels like these programs should be in some way comparable:
- SM64 renders natively to 320x240, which is in the same ballpark as a terminal on my 27 inch 2K monitor
- SM64 has a much simpler graphics pipeline than any modern game; for our purposes, we can almost treat the whole darn thing as one compute unit and compare to CC’s CPU rendering1
- Most important, they feel vaguely comparable to me, in an abstract how-hard-is-this way.
Both of these programs render at similar cadences; SM64 at 30 frames per second, CC at 60. This could be misleading, though; my last published Steam game, Deep Copy2, locks itself to 60FPS, but most of that time is spend idling to sync with the GPU. CC has to lay out a bunch of components, diff them, and render. SM64 has to lay out a scene graph and render. CC has to call out to the network, but SM64 has to deal with constantly swapping things in and out of limited VRAM. And so perhaps we have found our example; our small game engine which is more like Claude Code than, say, sl.
But…this doesn’t feel right. CC makes the fans on my water cooled3 9950X scream, scream more than emulating an entire N64. I don’t know what it’s doing, but it sure as hell isn’t idling. And thus our first question.
what the heck is claude code doing?
I told Claude to ruinperf stat -p PID -- sleep 3 on itself, so that it’d be chugging away while it was measuring. An apt level of rigor. It spit out these numbers:| Metric | Value |
|---|---|
| instructions | 4,686,618,787 |
| cycles | 2,319,469,653 |
| IPC | 2.02 |
| CPUs utilized | 0.260 |
| branch-misses | 1.38% |
OK. It’s using the CPU 26% of the time – not sleeping, not pegged. “It’s waiting for IO” is a reasonable theory for something you’d expect to be a couple of terminal control codes bolted onto a sophisticated HTTP API.
We can run strace to see what syscalls it’s making; for the uninitiated, the humble syscall isn’t much more than the API of the operating system. Since your program runs within the OS, everything it does that isn’t raw computation eventually boils down to a syscall. We’ll be able to see what it’s actually doing.
the aptly named futex
Next up: sudo strace -f -c -p PID for a reasonable period of time.
| % time | seconds | calls | syscall |
|---|---|---|---|
| 69.72 | 4.231 | 50,937 | futex |
| 14.16 | 0.859 | 17 | restart_syscall |
| 8.49 | 0.515 | 88,976 | sched_yield |
| 7.00 | 0.425 | 1,264 | epoll_pwait2 |
| 0.24 | 0.014 | 3 | wait4 |
| 0.16 | 0.009 | 3,603 | madvise |
Ah…of course. futex. The building block of locking on Linux; it’s what you’d use to write a mutex , one layer lower. And Claude Code is spending 70% of its time spinning on one, like Tantalus, one must imagine, having the data in sight but pulled back at the last moment.
Waiting on futex isn’t inherently bad, though. The real problem is the third entry in the table. We called sched_yield 89,000 times. Those threads aren’t waiting, they’re spinning, and when what they want isn’t ready, they’re all trying to throw the hot potato back to each other.
the right way to wait
At this point, something’s clearly wrong. We can use ps (of ps aux | grep whatever fame) to see a flattened process tree:
| TID | %CPU | WCHAN | Name |
|---|---|---|---|
| 710594 | 2.5 | do_epoll_wait | claude (main) |
| 710606 | 0.0 | futex_wait | Bun Pool 0 |
| 710607 | 0.0 | futex_wait | Bun Pool 1 |
| 710608 | 0.0 | futex_wait | Bun Pool 2 |
| 710609 | 0.0 | do_epoll_wait | HTTP Client |
| 710627 | 0.0 | futex_wait | Bun Pool 3 |
| 710628 | 0.0 | futex_wait | Bun Pool 4 |
| 710629 | 0.0 | futex_wait | Bun Pool 5 |
| 710667 | 0.0 | wait_woken | File Watcher |
| 715155 | 0.1 | futex_wait | claude |
| 715156 | 0.6 | futex_wait | HeapHelper |
| 715157 | 0.6 | futex_wait | HeapHelper |
| 715158 | 0.6 | futex_wait | HeapHelper |
| 715159 | 0.6 | futex_wait | HeapHelper |
| 715160 | 0.6 | futex_wait | HeapHelper |
| 715161 | 0.6 | futex_wait | HeapHelper |
| 715162 | 0.6 | futex_wait | HeapHelper |
| 715897 | 0.6 | futex_wait | JITWorker |
| 715898 | 0.0 | futex_wait | JITWorker |
| 715912 | 0.0 | futex_wait | t Helper Thread |
| 715913 | 0.0 | futex_wait | JITWorker |
Notice the main thread and the HTTP client thread sitting in do_epoll_wait. This part is what you would expect for a correctly IO bound program; you do something, but it’s not ready yet, so you use an OS primitive to say that you’d like to wait until it is. There are a bunch of different flavors of this, each solving a limitation of the previous as computers scale (select, poll, epoll, io_uring).
But CC is yielding. 89,000 times. We know this is wrong because:
- A thread doing work doesn’t
yield, because it’s doing work. - A thread that’s waiting doesn’t
yieldeither, though, because it uses one of those OS primitives.
is this just javascript?
Let’s be fair to CC, though. The main thread looks like it’s doing the right thing. You might think that this is an ecosystem thing; some problem in the design of Node, or Bun’s code, or V8, or whatever. I fired up amp to compare:
| % time | seconds | calls | syscall |
|---|---|---|---|
| 89.17 | 0.003 | 22 | futex |
| 8.12 | 0.000 | 26 | epoll_pwait |
| 0.82 | 0.000 | 1 | close |
| 0.65 | 0.000 | 5 | read |
| 0.59 | 0.000 | 5 | statx |
| 0.46 | 0.000 | 5 | write |
| 0.20 | 0.000 | 1 | epoll_ctl |
| TID | %CPU | WCHAN | Name |
|---|---|---|---|
| 715811 | 2.7 | do_epoll_wait | MainThread |
| 715812 | 0.0 | do_epoll_wait | DelayedTaskSche |
| 715813 | 0.3 | futex_wait | V8Worker |
| 715814 | 0.3 | futex_wait | V8Worker |
| 715815 | 0.3 | futex_wait | V8Worker |
| 715816 | 0.3 | futex_wait | V8Worker |
| 715817 | 0.0 | futex_wait | SignalInspector |
| 715819 | 0.0 | futex_wait | libuv-worker |
| 715820 | 0.0 | futex_wait | libuv-worker |
| 715821 | 0.0 | futex_wait | libuv-worker |
| 715822 | 0.0 | futex_wait | libuv-worker |
Now that’s how you wait! Just look at the sheer difference in the number of calls. That alone tells you all you need to know, setting aside the correctness of those calls.
wasn’t this about super mario 64?
Right, right. We buried the lede a bit trying to figure out what CC was doing with the 4.6 billion instructions that were executed in its name; we forgot that the 4.6 billion was the number we were after in the first place.
This is the number of instructions that CC uses to:
- Monitor the output of a process
- Re-render, diff, and draw its component tree
- Stream a response over HTTP
- Pipe god knows what telemetry, feature flags, A/B tests4
How many instructions does it take SM64 to render some representative frame? Well, because I’m not Kaze Emanuar5, we’re just going to ballpark it. Take the aforementioned N64 specs, and let’s crank out something reasonable based on maxing out the hardware:
| Component | Clock | Cycles/frame | Effective ops/frame |
|---|---|---|---|
| VR4300 CPU | 93.75 MHz | 3.1M | ~3-4M instructions |
| RSP (geometry) | 62.5 MHz, 8-lane SIMD | 2.1M | ~5-8M scalar-equivalent ops |
| RDP (rasterizer) | 62.5 MHz | 2.1M | ~150-230K pixels |
| Total | ~7.3M | ~8-12M effective ops |
These numbers are surely very incorrect; SM64’s frame time is dominated (ironically) by IO. That 4MB of VRAM is hotly contested, and it even has to fit the framebuffer itself. That means a lot of swapping things in and out of memory, which means a lot of waiting around for things to be swapped in and out of memory.
For a real estimate, we might cut that in a quarter or even an eighth.
but we’re not going to do that
Because at this point, the numbers are flat out embarrassing. CC is using an order of magnitude more instructions on a 33ms “frame” basis than SM64 did to render a 3D world. This is about the best apples-to-apples comparison I could come up with; note that “number of instructions” in some sense is independent of the power of the hardware. An add is an add, after all.
This is, of course, blatantly untrue. Modern processors do stuff like executing instructions out of order, or guessing which branch your code will take to get a head start, or pipeline code. The amount of “work” that one instruction does on a modern processor is, again, almost incomparable to the N64’s.
And yet, it tells us something.
no hugging, no learning
Claude Code is churing out an order of magnitude more instructions than SM64 did, to…diff a terminal panel’s worth of text and draw it. Of course, I get it; that simple diff is being fed by content from an HTTP server, running on top of the most sophisticated JIT ever created, running inside not a terminal but a terminal emulator, and, and, and…
I get it. The embarrassing part is “why has no one looked into the nearly 30,000 erroneous syscalls being issues every second” in one of the most valuable programs in the world. There has clearly been a loss in systems programming; knowledge once possessed and now not. I think this is because of money. There’s a lot of money in systems programming, in genuine innovation powered by a deep understanding of how the computer beneath you works.
There is a lot more money in application programming. A lot. Why make a modestly successful business improving on some systems-level tech when you could moonshot a frontier model and make literally a trillion dollars? Or take a quick exit on ai.pets.com and buy a lake house for your just-retired folks?
I pass no judgment upon Anthropic for being application developers. Besides, that’s what you do with all the money. You pay a billion dollars for the excellent folks at Bun to come on and fix your systems-level woes6.
It’s fine to say you don’t know why your TUI is so slow; it’s fine to prioritize making a trillion dollars over doing good quality systems programming. I say that without a hint of irony; I would do the same.
But…don’t pretend you’re building a game engine7.