How you probably will find Satoshi

Sorry for not blogging in a while: I started posting while furloughed during a government shutdown, but as soon as that ended, I found myself much less interested in blogging :)

I and a brilliant security researcher on Twitter named tmctmt have been curious about the identity of Satoshi Nakamoto, the inventor of Bitcoin. Following what I believe to be an incorrect article published in The New York Times, my interest has grown substantially. So, when I stumbled on a blogpost by Sergio Lerner titled “How you will not uncover Satoshi”, I took note.

The premise of that article simple: Satoshi Nakamoto exported his Bitcoin whitepapers using OpenOffice’s PDF exporter.

The PDF exporter includes, as metadata, a hash of, among other things, where the PDF was saved. Therefore, if you can guess where Satoshi saved the PDF — say, C:\Users\Satoshi’s Real Name Here\Documents\Bitcoin.pdf — you might end up with his name. If you believed Sergio Lerner, then you would think you’d need a lucky guess of all of these things simultaneously, and — if you were so lucky — you’d have Satoshi:

The file destination,
Whether Satoshi used Windows XP or Windows Vista (after all, Windows XP and Windows XP have a different user-directory paths),
Whether Satoshi saved in a user directory at all, and
Which of 1,000 milliseconds Satoshi saved the file in

Fortunately, this isn’t actually true. You can guess Satoshi’s username with the whitepaper with a lot more ease than that.

But first, let’s clear up some worrying possibilities and try to avoid assumptions:

We are assuming Satoshi used OpenOffice’s PDF exporter.

We believe this to be true because the OpenOffice PDF exporter was bespoke. You can actually just look at the PostScript and see that it could only have come from OpenOffice.
We are assuming Satoshi used Windows XP.
We believe this to be true because Satoshi extensively referred to Windows XP service pack 2; used XP in his screenshots; pointed out lack of testing on Vista; etc. There is an enormous amount of evidence for this. But perhaps more importantly, see also, point #3.
We are assuming that Satoshi’s Windows installation (and therefore, most likely, user account) predates his ideas around Bitcoin.
Satoshi repeatedly emphasized that his interest in doing something like Bitcoin came about around 2007-2008, which matches up with the death of E-Gold. Satoshi had installed Windows
However, I found that it was extremely unlikely. When you go to install software like WinRAR, it is unlikely that you deliberately seek out a random point release for no reason. It is also unlikely that you go out of your way to update WinRAR.

Crucially, Satoshi Nakamoto used WinRAR for his earlier Bitcoin releases. So if you download 100 copies of WinRAR, extract his old releases, and then re-compress them with WinRAR, there’ll come a point where WinRAR changes lead to different archives from the archive Satoshi produced, either because you’ve gone too far back, or because you’re too far forward.
I found that Satoshi’s bitcoin .rar files could only have been produced by WinRAR versions older than WinRAR 3.62, which came out December 04, 2006. That’s before Satoshi had started on Bitcoin, and even before the E-Gold indictment.
We are assuming that Satoshi did not deliberately tamper with the PDFs.

This is harder to prove; however, the creation date does not appear to have been tampered with, or if it has, Satoshi manually recomputed the proper document checksum, which is a little excessive. Further, Satoshi would have had to have modified both the pre-release draft that he sent to Stealthmonger (someone who I am surprised doesn’t have any Satoshi-related conspiracies to his name, not that I think he’s Satoshi) who later went on to share it, as well as modify the public final release.
If he failed to do so with either of these PDFs, then it’s still possible to find Satoshi.

As discussed, finding Satoshi by cracking the hash in the PDF appears intractable if you’re trying to exploit forensically-useful OpenOffice quirks. But it’s not, for a number of reasons.

While the millisecond in which Satoshi saved his file is part of the hash, there aren’t 1,000 milliseconds that you have to check, there are only 65. How is this possible? If you’ve looked at enough malware1, you probably already know. But if you don’t:

Satoshi’s rar files offer a hint: Satoshi stripped/changed all his files’ timestamps, except at one point he forgot to strip the timestamp on a few folders. Using unrar5, you can view the millisecond field:

The src/object folder in Bitcoin’s first release’s RAR file was modified at 15:3:27.4687500; src/rc, at 15:37:33.6718750; and src, at 23:15:06.70312502. These are obviously from a 64-tick timer with no detectable phase shift. It turns out, if you’re running on normal Windows XP hardware, this is exactly what you’d expect to see from anything using SYSTEMTIME, like WinRAR3 does.

This gives us a 15x speedup.

I’ve validated this on real documents from a real Windows XP device as well: I want to thank Ben on Twitter for getting a Windows XP netbook set up for me between 2 AM and 3 AM and, without much of an explanation, getting me 16 separate PDFs from OpenOffice 2.4.0 through it.

It turns out, you don’t need to guess where Satoshi saved his writeups. We’re actually absurdly lucky regarding the path names. Sergio Lerner made a mistake: while OpenOffice hashes where the PDF file was saved, the PDF file is actually not first saved in the location where you tell OpenOffice to put the PDF.

The PDF is made in a temporary folder on Windows in a subfolder of a folder that, on Windows, is named “TEMP”. TEMP points to a folder in your Windows user’s account.

I spend a lot of time reverse engineering Windows and Windows malware for my day job.

So, knowing it was in %TMP%, and knowing that %TMP% always has a funny name on Windows, I wanted to replicate the exact function that makes it so funny.

See, on Windows (XP in this case), the user “Fox Chapel Research”4 s temp folder would be in C:\Documents & Settings\Local Settings\Temp. But When you access it via the TMP or TEMP variables, as OpenOffice does, it’s actually presented as C:\DOCUME~1\FOXCHA~1\LOCALS~1\Temp.

In this case, everything you really need to know is in userenv.dll, but it behaves almost identically to ReactOS’ code, so you can just rip that if you want.

Point is, though, usernames are simplified and truncated, particularly if they’re longer than 8 characters. Periods in a username trigger weird behavior, but the fact that names are truncated, the number of characters gets reduced, etc., is extremely helpful in narrowing the keyspace.

For some inexplicable reason, OpenOffice saves their temporary files in paths of the form “%TMP%\svXXX.tmp\svYYY.tmp”, where YYY and XXX are two different strings.

Only, we’re extremely lucky: YYY depends on XXX and will be rather small when a file hasn’t been changed a lot. You can diff the two known Bitcoin PDFs, see that about 15 changes have been made, and assume that the distance from “YYY” to “XXX” is likely no further than maybe 256 modulo 26^3.

Where does 26^3 come from, you ask? Well, XXX and YYY are, inexplicably, base 26. This may seem sensible: it covers all lower-case letters, right? Well, for whatever reason, it’s actually 0-p. There’s a comment in the OpenOffice codebase along the lines of “0-p ???” that I thought was pretty funny. This is great because it could’ve been base-36, base-62, or something similarly horrible, and that would have blown up the keyspace. 26^3 times 256 or (for safety) 384 is not too bad a number of keys to check. But yes, three characters, each base-26, gives us 26^3.

It turns out, you can modify Hashcat’s MD5 kernel, then ask ChatGPT to write every combination of template instantiations for all realistic username, length, string, etc., in a CUDA port, then, schedule usernames to optimize for this, and get pretty insane throughput. (There are some other nice tricks, too!)

You can then write a bunch of tests, throw a battery of them at the program, make sure it works, and then hook it up to a distributed task queue. You can then rent a ton of vast.ai instances (shoutout again to tmctmt on Twitter for financing this!), particularly interruptible instances, and make a ton of progress through a lot of conceivable names. If you’ve ever picked a Windows account username, there’s a high probability that I’ve guessed it; and in the space of “distinct account names created by English-speaking Windows users”, I’ve probably guessed about 60%5?

This did not find Satoshi, but I would like to throw more GPUs at this this with a considerably larger budget, more eyes on the solver code, etc. So if you’re interested in helping out, please do reach out :)

I asked an LLM to build a wasm blob w/hashcat’s maskgenerator + some code I wrote to generate name variations + an old (non-templated, for obvious reasons if you know anything about WebGPU >:() version of the cracker. It’s not really that fast, but it’s probably fine. I have barely tested it, ChatGPT did some silly things, etc. But it’ll run on anything that supports WebGPU (assuming it works lol) and that’s pretty cool.

Sorry about the complete lack of documentation! ChatGPT did its best to port over the weird flags I have in my CUDA version.

You can find it here: https://foxchapelresearch.github.io/satoshi-solver/

Satoshi did not use Linux. I bruteforced literally every conceivable temporary file path. 😎
His Windows username is not <=4 characters long, nor is it any common name. I have exhaustively checked every legal 4-character name. I have also checked nearly all lowercase-alpha-only and titlecase-alpha-only names, plus all of 2009 Wiktionary, and countless more names based on somewhat optimized name rules.

Please message me if you think you can help fund a more exhaustive search. Vast.ai accepts cryptocurrency, if you would like to contribute GPUs that way. Alternatively, if you have a GPU farm that you would be willing to throw at the problem, that’d be awesome!

You can reach me me via Substack’s messages, Twitter DMs, or email.