Implementing a Z80 / ZX Spectrum emulator with Claude Code

148 points by antirez 3 days ago · 76 comments

Reader

jaen a day ago

There isn't any attempt to falsify the "clean room" claim in the article - a rational approach would be to not provide any documents about the Z80 and the Spectrum, and just ask it to one-shot an emulator and compare the outputs...

If the one-shot output resembles anything working (and I am betting it will), then obviously this isn't clean room at all.

the_af 19 hours ago

Even without internet access, probably everything there is to say about Z80/Speccy emulators was already in its training set.
measurablefunc 19 hours ago

Author just trusts the agent to not use the internet b/c he wrote it so in the instructions should tell you all you need to know. It's great he managed to prompt it w/ the right specification for writing yet another emulator but I don't think he understands how LLMs actually work so most of the commentary on what's going on with the "psychology" of the LLM should be ignored.
antirezOP a day ago

You didn't read the full article. The past paragraph talks about this specifically.
- tredre3 19 hours ago
  
  In the last paragraph you handwave that all the Z80 and ZX Spectrum documentations is likely already in the model anyway... Choosing to not provide the documents/websites might then requiring more prompting to finish the emulator, but the knowledge is there. You can't clean room with a large LLM. That's delusion!
- jaen 8 hours ago
  
  I mean, for an article that's titled "clean room", that would be the first thing to do, not as a "maybe follow up in the future"...
  (I do think the article could have stood on its own without mentioning anything about "clean room", which is a very high standard.)
  For the handwavy point about the x86 assembler, I am quite sure that the LLM will remember the entirety of the x86 instruction set without any reference, it's more of a problem of having a very well-tuned agentic loop with no context pollution to extract it. (which you won't get by YOLOing Claude, because LLMs aren't that meta-RLed yet to be able to correct their own context/prompt-engineering problems)
  Or alternatively, to exploit context pollution, take half of an open-source project and let the LLM fill in the rest (try to imagine the synthetic "prompt" it was given when training on this repo) and see how far it is from the actual version.

stevekemp a day ago

I grew up with the Spectrum, and wrote a CP/M emulator a while back. I'd be curious to see how complete it would get.

I struggled a lot with some complex software, which worked on some emulators and failed on others (and mine).

For example one bug I had, which is still outstanding, relates to the Hisoft C compiler:

https://github.com/skx/cpmulator/issues/250

But I see that my cpm-dist repository is referenced in the download script so that made me happy!

It's great to see people still using CP/M, writing software for it, and sharing the knowledge. Though I do think the choice to implement the CCP in C, rather than using a genuine one, is an interesting one, and a bit of a cheat. It means that you cannot use "SUBMIT" and other common-place binaries/utilities.

antirezOP a day ago

Thank you for your work about CP/M, Steve!
tasty_freeze 16 hours ago

Knowing nothing about your code, I'd suggest checking if the code uses the DAA instruction. It is by far the trickiest thing to get right. Don't assume well behaved code -- what happens if A=0x5C and B=0xF4 and you execute "add b; daa"? That is, if you attempt to correct a sum which didn't start with valid decimal digits.
- stevekemp 9 hours ago
  
  Interesting point, I'll take a look.
  (The z80 emulation was the only thing I didn't write myself, though it does pass the standard test-programs I've not looked at how complex/complete their testing is.)

alnwlsn 12 hours ago

I happen to be working on an RP2350 TRS-80 emulator, and I've been using this one [0] as the z80 core (because I figured this had been done enough times and why reinvent the wheel?).

Looking at the two z80.c side by side, it definitely doesn't look like a copy-paste job (at least compared with [0], I'm sure it was trained on many others). The AI version is a lot better commented, although [0] is probably closer to how I would have structured it had I written one.

The interfacing between the two is very similar though, so I was curious to try the AI version to see if there was any cycle efficiency difference. Running the same short loop in Level II basic, I find the AI version to be just about 1.5% slower. Make of that what you will.

On one RP2350 core, I figure these versions top out about equal to a 6Mhz Z80. I do wonder what you would get if you asked for a version optimized for ARM Cortex-M33.

0 - https://github.com/superzazu/z80

geraneum 18 hours ago

> In short: the implementation was performed in a very similar way to how a human programmer would do it, and not outputting a complete implementation from scratch “uncompressing” it from the weights.

> Instead, different classes of instructions were implemented incrementally, and there were bugs that were fixed…

Not sure the author fully grasps how and why LLM agents work this way. There’s a leap of logic here: the agent runs in a loop where command outputs get fed back as context for further token generation, which is what produces the incremental human like process he’s observing. It’s still that “decompression” from the weights, still the LLM’s unique way of extracting and blending patterns from training data, that’s doing the actual work. The agentic scaffolding just lets it happen in many small steps against real feedback instead of all at once. So the novel output is real, but he’s crediting the wrong thing for it.

avadodin a day ago

So what you're saying is that it's not just the machine-readable documentation built over decades of the officially undocumented behavior of Z80 opcodes—often provided under restrictive licenses—it's also the "known techniques and patterns" of emulator code—often provided under restrictive licenses.

pohl 14 hours ago

Very fun. I wonder, though: can one ever do “clean room” anything using these Plagiarism Laundering machines?

cbolton a day ago

I asked Gemini to reproduce the poem "The Road Not Taken". I got it in full (as far as I can tell without Gemini fetching anything from the web). I didn't provide any verse of the poem so I guess that counts as a clean room "implementation"?

visarga 11 hours ago

Reimplementing function is not the same with wholesale regurgitation.

ilaksh 15 hours ago

For the skeptical, I guess we need to create a truly novel virtual machine and instruction set and then test the implementation of that.

But anything is going to be in some way similar to one degree or another to one or more real projects. So skeptics will claim that it doesn't prove anything.

But that's just the nature of reality and problems solving and such an exercise would prove it could create a compiler for a novel platform.

It would be great if there were a website where all of these skeptics could register in solidarity their assuredness that "it's not real AI and can't be creative" and pledge not to use it.

itomato a day ago

All the design hints required for this or any other type of agentic "set it and forget it" development are interesting to me, because they enable the result but also lock in less-than-desirable results that exhibit a miss "like simulating a 2Mhz clock".

What if Agents were hip enough to recognize that they have navigated into a specialized area and need additional hinting? "I'm set up for CP/M development, but what I really need now is Z80 memory management technique. Let me swap my tool head for the low-level Z80 unit..."

We can throw RAGs on the pile and hope the context window includes the relevant tokens, but what if there were pointers instead?

hoc 20 hours ago

Great project and write-up. I wonder whether most of those "hints" are really needed, though, as you are already using Claude CODE. Aren't things like "simple" and "clean" assumed to be part of its system prompt already (idnividual documentation style etc can't be, of course). While they were useful when using a general LLM for coding, I would think that they are now part of the overall setup of any coding agent. These days I run more into problems with language and api version drifts, even when specified beforehand.

ralferoo a day ago

The problem is that it will have been trained on multiple open source spectrum emulators. Even "don't access the internet" isn't going to help much if it can parrot someone else's emulator verbatim just from training.

Maybe a more sensible challenge would be to describe a system that hasn't previously been emulated before (or had an emulator source released publicly as far as you can tell from the internet) and then try it.

For fun, try using obscure CPUs giving it the same level of specification as you needed for this, or even try an imagined Z80-like but swapping the order of the bits in the encodings and different orderings for the ALU instructions and see how it manages it.

throwa356262 a day ago

I think you are into something here.
I tried creating an emulator for CPU that is very well known but lacks working open source emulators.
Claude, Codex and Gemini were very good at starting something that looked great but all failed to reach a working product. They all ended up in a loop where fixing one issues caused something else to break and could never get out of it.
- trollbridge 13 hours ago
  
  I’ve been trying to do the same thing as a hobby project to just imagine some “what ifs” with some slight changes to the original 8086 and the 80286.
  It just never produces an actually working result without a lot of intervention on my part. (My change was merely changing the paragraph size from 4 bits to 8 bits on the 8086.)
- stuaxo a day ago
  
  When they get stuck, I find adding debug that the model can access helps. + Sometimes you need to add something into the prompt to tell it to avoid some approach at a point.
- antirezOP a day ago
  
  Please tell me what CPU it is. I would give it a try. I doubt strongly a very well documented CPU can't be emulated by writing the code with modern AIs.
- dboreham 19 hours ago
  
  Interesting. When I had Claude write a language transpiler it always checked that tests passed before declaring a feature ready for PR. There was never a case where it gave up on achieving that goal.
PontifexMinimus a day ago

> try using obscure CPUs
Better still invent a CPU instruction set, and get it to write an emulator for that instruction set in C.
Then invent a C-like HLL and get it to write a compiler from your HLL to your instruction set.
abainbridge a day ago

> try using obscure CPUs
I tried asking Gemini and ChatGPT, "What opcode has the value 0x3c on the Intel 8048?"
They were both wrong. The datasheet with the correct encodings is easily found online. And there are several correct open source emulators, eg MAME.
- bsoles a day ago
  
  Even on a specific STM microcontroller (STM32G031), the LLM tools invent non-existent registers and then apologize when I point it out. And conversely, they write code for an entire algorithm (CRC, for example) when hardware support already exists on the chip.
- stuaxo a day ago
  
  Think of "What opcode has the value 0x3c on the Intel 8048" as a PNG image but the LLM like a very compressed JPEG. It will only get a very approximate answer. But you can give it explicit tools to look up things.
- yomismoaqui a day ago
  
  If the LLM doesn't have a websearch tool your test doesn't make any sense.
  An LLM by itself is like a lossy image of all text in the internet.
  - deniska a day ago
    
    Just some more parameters, and it would overfit that specific PDF too.
kamranjon a day ago

I thought this part of the write-up was interesting:
"This is, I think, in contradiction with the idea that LLMs are memorizing the whole training set and uncompress what they have seen. LLMs can memorize certain over-represented documents and code, but while they can extract such verbatim parts of the code if prompted to do so, they don’t have a copy of everything they saw during the training set, nor they spontaneously emit copies of already seen code, in their normal operation."
Can't things basically get baked into the weights when trained on enough iterations, and isn't this the basis for a lot of plagiarism issues we saw with regards to code and literature? It seems like this is maybe downplaying the unattributed use of open source code when training these models.
dist-epoch a day ago

If you did that, comments would be "it's just a bit shuffle of the encodings, of course it can manage that, but how about we do totally random encodings..."
- ralferoo a day ago
  
  That's true, but I still think it'd be an interesting experiment to see how much it actually follows the specification vs how much it hallucinates by plagiarising from existing code.
  Probably bonus points for telling it that you're emulating the well known ZX Spectrum and then describe something entire different and see whether it just treats that name as an arbitrary label, or whether it significantly influences its code generation.
  But you're right of course, instruction decoding is a relatively small portion of a CPU that the differences would be quite limited if all the other details remained the same. That's why a completely hypothetical system is better.

rjh29 a day ago

No Carmack or Stallman. Just the right person at the right time.

ontouchstart 20 hours ago

Is it possible to build a full OS emulator on top of MMIX?

> The above tools could theoretically be used to compile, build, and bootstrap an entire FreeBSD, Linux, or other similar operating system kernel onto MMIX hardware, were such hardware to exist.

https://en.wikipedia.org/wiki/MMIX

le-mark a day ago

Who else had ai implement an emulator? Raises hand. A 6502 emulator in JavaScript was my first Gemini experiment.

kazinator a day ago

What'a a "clear room"? A clean room, but with plagiarized code, laundered through an LLM?

muyuu 12 hours ago

how much of a clean room can you claim when you don't know exactly what code has your LLM looked at?

love the project of course, but LLMs are a huge caveat to such claims, which will be very hard to make credibly in the future for anything not entirely novel

dist-epoch a day ago

> I believe automatic programming to be already super-human, not in the sense it is currently capable of producing code that humans can’t produce, but in the concurrent usage of different programming languages, system programming techniques, DSP stuff, operating system tricks, math, and everything needed to reach the result in the most immediate way.

As HN likes to say, only a amateur vibe-coder could believe this.

Zafira a day ago

It is really quite something how many people that have earned credibility designing well-loved tools seem to be true believers in the AI codswallop.
- jlarcombe a day ago
  
  it's fascinating / astonishing

themafia a day ago

in spectrum.c

> Address bits for pixel (x, y): > * 010 Y7 Y6 Y2 Y1 Y0 | Y5 Y4 Y3 X7 X6 X5 X4 X3

Which is wrong. It's x4-x0. Comment does not match the code below.

> static inline uint16_t zx_pixel_addr(int y, int col) {

It computes a pixel address with 0x4000 added to it only to always subtract 0x4000 from it later. The ZX apparently has ROM at 0x0000..0x3fff necessitating the shift in general but not in this case in particular.

This and the other inline function next to it for attributes are only ever used once.

> During the > * 192 display scanlines, the ULA fetches screen data for 128 T-states per > * line.

Yep.. but..

> Instead of a 69,888-byte lookup table

How does that follow? The description completely forgets to mention that it's 192 scan lines + 64+56 border lines * 224 T-States.

I'm bored. This is a pretty muddy implementation. It reminds me of the way children play with Duplo blocks.

antirezOP a day ago

What happened with the wrong pixel layout is that the specification was wrong (the problem is that sub agents spawned recently by Claude Code are Haiuku session, their weakest model -- you can see the broken specification under spectrum-specs), it entered the code, caused a bug that Claude later fixed, without updating the comment. This actually somewhat shows that even under adversarial documentation it can fix the problem.
IMHO zx_pixel_addr() is not bad, makes sense in this case. I'm a lot more unhappy with the actual implementation of the screen -> RGB conversion that uses such function, which is not as fast as it could be. For instance my own zx2040 emulator video RAM to ST77xx display conversion (written by hand, also on GitHub) is more optimized in this case. But the fact to provide the absolute address in the video memory is ok, instead of the offset. Just design.
> This and the other inline function next to it for attributes are only ever used once.
I agree with that but honestly 90% of the developers work in this way. And LLMs have such style for this reason. I stile I dislike as well...
About the lookup table, the code that it uses in the end was a hint I provided to it, in zx_contend_delay(). The old code was correct but extremely memory wasteful (there are emulators really taking this path of the huge lookup table, maybe to avoid the division for maximum speed), and there was the full comment about the T-states, but after the code was changed this half-comment is bad and totally useless indeed. In the Spectrum emulator I provided a few hints. In the Z80, no hint at all.
If you check the code in general, the Z80 implementation for instance, it is solid work on average. Normally after using automatic programming in this way, I would ask the agent (and likely Codex as well) to check that the comments match the documentation. Here, since it is an experiment, I did zero refinements, to show what is the actual raw output you get. And it is not bad, I believe.
P.S. I see your comment greyed out, I didn't downvote you.
dang 20 hours ago

> It reminds me of the way children play with Duplo blocks.
WTF? I appreciate your technical expertise but you can't be aggressive like this on HN, and we've had to ask you this before: https://news.ycombinator.com/item?id=45663563.
If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it.
- themafia 20 hours ago
  
  > you can't be aggressive
  I disagree that this is "aggressive." It's certainly opinionated. I think the AI does a bad job here and I'm attempting to express that in a humorous and qualified way.
  > WTF?
  You don't consider this to be "aggressive?"
  > stick to the rules when posting here
  Do you genuinely think I'm trying to be disruptive?
  - dang 16 hours ago
    
    Ah - I interpreted the putdown as being about what antirez did, not what Claude did. It sounds like I misread that, and I apologize.
nz 19 hours ago

Even though I understand your sentiment, and think it is sincere, I think this is intellectually dishonest. Even though I have been programming since I was 16 (20 years), I still program like a child playing with Duplo blocks, when using a novel or otherwise unfamiliar technology. I bet that you do too. I also think that every programmer should play with their computers once in a while. Explore. Discover. Even if it means allowing yourself to be alienated from your means of production.

bornfreddy 10 hours ago

What does "clean room" even mean in the case of agents? Weren't they trained on the existing code, probably including the code performing the same tasks they are solving? It is not clean room implementation of C compiler if the engineer read the GCC source code, even if the task is to do it in Rust.

rasz 13 hours ago

Thats nothing, I had Claude _clean room_ Carmacks Fast inverse square root, with comments and copyright!

bitwize 17 hours ago

Claude Code isn't smart in its own right. LLM "intelligence" is really just other people's intelligence ground up into a thought-slurry and piped through your PC. The irony is, things like emulators for popular systems are absurdly good targets for LLMs because they've been done so many times before that there's lots of training data for such a thing to draw from—but an emulator is also the sort of project which, because it's been done so many times before, the biggest benefit a programmer will get from writing a new one is the learning experience of doing it on their own.

xcf_seetan a day ago

I had Claude make an quad core 32 bits z80 just for fun.

<https://pastebin.com/Z2b82LHG>

klelatti a day ago

Fascinating, but I'm not sure how these are consistent?
- Based on classic Z80 architecture by Zilog - Inspired by modern RISC designs (ARM, RISC-V, MIPS)
- HarHarVeryFunny a day ago
  
  The Z80 itself was "inspired" by the 8080, notably having dual 8080 register sets. It might be regarded as a "clear" (sic) room reimplemention/enhancement of the 8080 given that it was the same 8080 designers who left Intel to found Zilog and create the Z80.
- throwa356262 a day ago
  
  Z80 is CISC. This looks like a MIPS.
  Funny enough, there is a 32-bit version of Z80 called Z380.

UltraSane a day ago

It is "clean room"

sylware 18 hours ago

Anybody: can I test claude code without a whatng cartel web engine? web API using curl with an "public" token? Anything else?

I am itching at testing its ability to code assembly.

paxys 20 hours ago

What is "clear room"? If he means clean room, no, this doesn't qualify.

I wish people would stop using this phrase altogether for LLM-assisted coding. It has a specific legal and cultural meaning, and the giant amount of proprietary IP that has been (illegally?) fed to the model during training completely disqualifies any LLM output from claiming this status.

airza a day ago

You use clean room everywhere in the article and clear room in the title. Is this on purpose?

lazide a day ago

Literally nothing about it is either, either.
- rustyhancock a day ago
  
  Yes for a moment I thought clear room might mean something else for LLMs.
  Essentially they can't do clean room anything!
  You might as well hire the entire former mid level of a businesses programming team and claim it's clean room work
  - steve1977 a day ago
    
    Windows NT is not VMS! Trust me!
    
    rustyhancock a day ago
    
    Had to Google this but I do love a deep cut reference!
    https://www.itprotoday.com/server-virtualization/windows-nt-...
HarHarVeryFunny a day ago

At first I thought it was brain slip in the HN title, then I saw TFA also said "clear", so thought it was perhaps a sarcastic jab at the original "clean" room story it is commenting on, but maybe in the end just an error ?
In any case, an interesting experiment.
- HarHarVeryFunny a day ago
  
  It would also be interesting to see how well the best open weights models such as Kimi K2.5 can do on a task like this with the same prompting to first gather specs, etc, etc.
  In fact this would make for an interesting benchmark - writing entire non-trivial apps based on the same prompt. Each model might be expected to write and use it's own test cases, but then all could be judged based on a common set of test cases provided as part of the benchmark suite.

jlarcombe a day ago

How on earth does this count as "clean room" in any way, when many open-source Z80 emulators will without doubt have been part of its training data?

HarHarVeryFunny a day ago

Perhaps why the title said "clear" room ?

ggaughan a day ago

Wow

Settings

Implementing a Z80 / ZX Spectrum emulator with Claude Code

Keyboard Shortcuts