Claude Code's renderer is more complex than a game engine

One of the maintainers of Claude Code said something extremely funny on the internet.

I’m not here to dunk on this person who is clearly earnest and well-intentioned. I have thought and said far worse, far more confidently.

But…how silly is it? Clearly, comparing Claude Code to Grand Theft Auto 6 is not fair. The author did specify a small game engine, and even the most brittle, sun-bleached of straw men would concede that that’s not what they meant.

But as I walked down the hierarchy of modern games, nothing felt like a fair comparison. Modern games are just too complex to be intelligibly compared to a TUI. There must be a game which makes this comparison of Claude Code to a game engine be more correct.

how about super mario 64?

Super Mario 64. A classic in 1996 as much as it is in 2026, and a staple of the speedrunning community thanks to its tight controls, just-right number of glitches and exploits, timeless level design. And, more relevant to our purposes, it ran on a fucking N64. That’s right, baby. Check out these specs:

Component	Spec
CPU (VR4300)	93.75 MHz, ~125 MIPS
RSP (geometry)	62.5 MHz, 8-lane SIMD (8x16-bit), ~500M fixed-point ops/sec theoretical peak
RDP (rasterizer)	62.5 MHz, ~30M pixels/sec fill rate (with Z-buffer)
RAM	4 MB unified RDRAM, ~562 MB/s peak bandwidth, ~640ns latency
Texture cache (TMEM)	4 KB
Framebuffer	320x240x16-bit color + 16-bit Z = ~300 KB

The worlds within the paintings in Peach’s Castle were bouncy, lit with life and color, and all on these meager specs. This felt like a fairer shake. Thus, our question.

Is Claude Code doing more work in producing and rendering one of its diffed frames than Super Mario 64 did to produce and render a frame of its gorgeous-but-primitive 3D world? If so, what the heck is it doing?

apples to ubermensch

Let’s get a few things out of the way. This comparison isn’t as much apples-to-apples as it is apples-to-theoretical-space-marine-ubermensch. In other words, not even the same species. The hardware these two programs run on could not be less alike. One kind of hardware can emulate the other, in software, like a gape-jawed matryoshka doll.

And yet, von Neumann sired them both, so we can make some ill advised comparison. More importantly, it feels like these programs should be in some way comparable:

SM64 renders natively to 320x240, which is in the same ballpark as a terminal on my 27 inch 2K monitor
SM64 has a much simpler graphics pipeline than any modern game; for our purposes, we can almost treat the whole darn thing as one compute unit and compare to CC’s CPU rendering¹
Most important, they feel vaguely comparable to me, in an abstract how-hard-is-this way.

Both of these programs render at similar cadences; SM64 at 30 frames per second, CC at 60. This could be misleading, though; my last published Steam game, Deep Copy², locks itself to 60FPS, but most of that time is spend idling to sync with the GPU. CC has to lay out a bunch of components, diff them, and render. SM64 has to lay out a scene graph and render. CC has to call out to the network, but SM64 has to deal with constantly swapping things in and out of limited VRAM. And so perhaps we have found our example; our small game engine which is more like Claude Code than, say, sl.

But…this doesn’t feel right. CC makes the fans on my water cooled³ 9950X scream, scream more than emulating an entire N64. I don’t know what it’s doing, but it sure as hell isn’t idling. And thus our first question.

what the heck is claude code doing?

I told Claude to ruin perf stat -p PID -- sleep 3 on itself, so that it’d be chugging away while it was measuring. An apt level of rigor. It spit out these numbers:

Metric	Value
instructions	4,686,618,787
cycles	2,319,469,653
IPC	2.02
CPUs utilized	0.260
branch-misses	1.38%

OK. It’s using the CPU 26% of the time – not sleeping, not pegged. “It’s waiting for IO” is a reasonable theory for something you’d expect to be a couple of terminal control codes bolted onto a sophisticated HTTP API.

We can run strace to see what syscalls it’s making; for the uninitiated, the humble syscall isn’t much more than the API of the operating system. Since your program runs within the OS, everything it does that isn’t raw computation eventually boils down to a syscall. We’ll be able to see what it’s actually doing.

the aptly named `futex`

Next up: sudo strace -f -c -p PID for a reasonable period of time.

% time	seconds	calls	syscall
69.72	4.231	50,937	futex
14.16	0.859	17	restart_syscall
8.49	0.515	88,976	sched_yield
7.00	0.425	1,264	epoll_pwait2
0.24	0.014	3	wait4
0.16	0.009	3,603	madvise

Ah…of course. futex. The building block of locking on Linux; it’s what you’d use to write a mutex , one layer lower. And Claude Code is spending 70% of its time spinning on one, like Tantalus, one must imagine, having the data in sight but pulled back at the last moment.

Waiting on futex isn’t inherently bad, though. The real problem is the third entry in the table. We called sched_yield 89,000 times. Those threads aren’t waiting, they’re spinning, and when what they want isn’t ready, they’re all trying to throw the hot potato back to each other.

the right way to wait

At this point, something’s clearly wrong. We can use ps (of ps aux | grep whatever fame) to see a flattened process tree:

TID	%CPU	WCHAN	Name
710594	2.5	do_epoll_wait	claude (main)
710606	0.0	futex_wait	Bun Pool 0
710607	0.0	futex_wait	Bun Pool 1
710608	0.0	futex_wait	Bun Pool 2
710609	0.0	do_epoll_wait	HTTP Client
710627	0.0	futex_wait	Bun Pool 3
710628	0.0	futex_wait	Bun Pool 4
710629	0.0	futex_wait	Bun Pool 5
710667	0.0	wait_woken	File Watcher
715155	0.1	futex_wait	claude
715156	0.6	futex_wait	HeapHelper
715157	0.6	futex_wait	HeapHelper
715158	0.6	futex_wait	HeapHelper
715159	0.6	futex_wait	HeapHelper
715160	0.6	futex_wait	HeapHelper
715161	0.6	futex_wait	HeapHelper
715162	0.6	futex_wait	HeapHelper
715897	0.6	futex_wait	JITWorker
715898	0.0	futex_wait	JITWorker
715912	0.0	futex_wait	t Helper Thread
715913	0.0	futex_wait	JITWorker

Notice the main thread and the HTTP client thread sitting in do_epoll_wait. This part is what you would expect for a correctly IO bound program; you do something, but it’s not ready yet, so you use an OS primitive to say that you’d like to wait until it is. There are a bunch of different flavors of this, each solving a limitation of the previous as computers scale (select, poll, epoll, io_uring).

But CC is yielding. 89,000 times. We know this is wrong because:

A thread doing work doesn’t yield, because it’s doing work.
A thread that’s waiting doesn’t yield either, though, because it uses one of those OS primitives.

is this just javascript?

Let’s be fair to CC, though. The main thread looks like it’s doing the right thing. You might think that this is an ecosystem thing; some problem in the design of Node, or Bun’s code, or V8, or whatever. I fired up amp to compare:

% time	seconds	calls	syscall
89.17	0.003	22	futex
8.12	0.000	26	epoll_pwait
0.82	0.000	1	close
0.65	0.000	5	read
0.59	0.000	5	statx
0.46	0.000	5	write
0.20	0.000	1	epoll_ctl

TID	%CPU	WCHAN	Name
715811	2.7	do_epoll_wait	MainThread
715812	0.0	do_epoll_wait	DelayedTaskSche
715813	0.3	futex_wait	V8Worker
715814	0.3	futex_wait	V8Worker
715815	0.3	futex_wait	V8Worker
715816	0.3	futex_wait	V8Worker
715817	0.0	futex_wait	SignalInspector
715819	0.0	futex_wait	libuv-worker
715820	0.0	futex_wait	libuv-worker
715821	0.0	futex_wait	libuv-worker
715822	0.0	futex_wait	libuv-worker

Now that’s how you wait! Just look at the sheer difference in the number of calls. That alone tells you all you need to know, setting aside the correctness of those calls.

wasn’t this about super mario 64?

Right, right. We buried the lede a bit trying to figure out what CC was doing with the 4.6 billion instructions that were executed in its name; we forgot that the 4.6 billion was the number we were after in the first place.

This is the number of instructions that CC uses to:

Monitor the output of a process
Re-render, diff, and draw its component tree
Stream a response over HTTP
Pipe god knows what telemetry, feature flags, A/B tests⁴

How many instructions does it take SM64 to render some representative frame? Well, because I’m not Kaze Emanuar⁵, we’re just going to ballpark it. Take the aforementioned N64 specs, and let’s crank out something reasonable based on maxing out the hardware:

Component	Clock	Cycles/frame	Effective ops/frame
VR4300 CPU	93.75 MHz	3.1M	~3-4M instructions
RSP (geometry)	62.5 MHz, 8-lane SIMD	2.1M	~5-8M scalar-equivalent ops
RDP (rasterizer)	62.5 MHz	2.1M	~150-230K pixels
Total		~7.3M	~8-12M effective ops

These numbers are surely very incorrect; SM64’s frame time is dominated (ironically) by IO. That 4MB of VRAM is hotly contested, and it even has to fit the framebuffer itself. That means a lot of swapping things in and out of memory, which means a lot of waiting around for things to be swapped in and out of memory.

For a real estimate, we might cut that in a quarter or even an eighth.

but we’re not going to do that

Because at this point, the numbers are flat out embarrassing. CC is using an order of magnitude more instructions on a 33ms “frame” basis than SM64 did to render a 3D world. This is about the best apples-to-apples comparison I could come up with; note that “number of instructions” in some sense is independent of the power of the hardware. An add is an add, after all.

This is, of course, blatantly untrue. Modern processors do stuff like executing instructions out of order, or guessing which branch your code will take to get a head start, or pipeline code. The amount of “work” that one instruction does on a modern processor is, again, almost incomparable to the N64’s.

And yet, it tells us something.

no hugging, no learning

Claude Code is churing out an order of magnitude more instructions than SM64 did, to…diff a terminal panel’s worth of text and draw it. Of course, I get it; that simple diff is being fed by content from an HTTP server, running on top of the most sophisticated JIT ever created, running inside not a terminal but a terminal emulator, and, and, and…

I get it. The embarrassing part is “why has no one looked into the nearly 30,000 erroneous syscalls being issues every second” in one of the most valuable programs in the world. There has clearly been a loss in systems programming; knowledge once possessed and now not. I think this is because of money. There’s a lot of money in systems programming, in genuine innovation powered by a deep understanding of how the computer beneath you works.

There is a lot more money in application programming. A lot. Why make a modestly successful business improving on some systems-level tech when you could moonshot a frontier model and make literally a trillion dollars? Or take a quick exit on ai.pets.com and buy a lake house for your just-retired folks?

I pass no judgment upon Anthropic for being application developers. Besides, that’s what you do with all the money. You pay a billion dollars for the excellent folks at Bun to come on and fix your systems-level woes⁶.

It’s fine to say you don’t know why your TUI is so slow; it’s fine to prioritize making a trillion dollars over doing good quality systems programming. I say that without a hint of irony; I would do the same.

But…don’t pretend you’re building a game engine⁷.