Show HN: Codex builds a working NES Emulator in one hour

6 points by zi2zi-jit 4 months ago · 4 comments · 2 min read

Reader

Hi folks! I know NES emulators have been implemented countless times, in practically every language imaginable.

However, having an LLM fully replicate the spec purely from memory—without referencing existing code—is still a significant challenge. It requires the underlying model to have strong anti-hallucination capabilities and solid long-term planning to keep from going astray. Because of this, building an NES emulator makes for an excellent LLM stress test.

Here is how the emulator was built:

Data Gathering: I asked Codex to download the necessary developer manuals and test suites. It was strictly prohibited from searching for reference implementations online.

Development: I instructed Codex to build the emulator until all test suites passed. This process was mostly hands-free; I only chimed in to encourage it to continue when it paused.

First Draft: After just 4-5 prompts, Codex delivered a functional, pure-Python emulator—though it ran at a sluggish 7 FPS.

Optimization: Asking Codex to optimize the app completely on its own didn't work this time. Instead, I had it generate a flamegraph, which identified the PPU update as the bottleneck. I then instructed Codex to rewrite the PPU in Cython without breaking the passing tests.

Overall, I'm incredibly impressed by Codex. I already knew it was capable of the task, but the speed was astonishing. It finished the project in under an hour, using merely 2% of my weekly Pro quota.

While the NES might be a relatively easy system to emulate, I think emulation could serve as a fantastic benchmark for testing future LLMs.

qsera 4 months ago

Can you try to vibe code an AI shill detector next?

nunobrito 4 months ago

Quite amazing. This opens doors to many other emulators because now it can replicate quite nicely what is expected as output.

zi2zi-jitOP 4 months ago

Totally agree. I am looking to build something more complex next, something like PS1 in a different language as test. That would require significant more effort but with the speed of how model gets improved I am optimistic.
- nunobrito 4 months ago
  
  It seems the most difficult topic is automating the performance optimizations.
  For example: "I've run this task on real hardware and took 5 seconds, keep optimizing and iterating until you achieve similar values"
  I'd love seeing a linux emulator running on DART simply because it removes the need for dependencies on each platform.

Settings

Show HN: Codex builds a working NES Emulator in one hour

Keyboard Shortcuts