The Setup
It’s been raining in Utrecht for three days straight. The kind of grey that makes you question every life choice. My partner was out of town, the cat was judging me, and I had a full Saturday with nothing scheduled.
So naturally, I decided to cross-compile llama.cpp to run on Windows XP 64-bit.
Full technical write-up here for the tinkerers.
The Spark
This started, as these things do, with a stupid question in a Discord server: “What’s the oldest OS you could theoretically run inference on?” Someone said Windows 7. Someone else said Vista. I said nothing, but I opened a terminal.
I’ve been poking at llama.cpp for months. It’s become my go-to for local inference—fast, portable, actively maintained. But Windows XP? That’s a 24-year-old operating system. The iPhone didn’t exist when XP shipped. Google was still a search engine. The idea of running a transformer model on it felt like teaching a horse to send emails.
I wanted to see if it was possible. That’s it. No practical justification. Pure nerd curiosity.
What We Built
A complete llama.cpp deployment package for Windows XP 64-bit, cross-compiled from macOS: https://github.com/dandinu/llama.cpp
- 70+ Windows executables (llama-cli, llama-bench, llama-quantize, etc.)
- Full GGUF model support
- SSE4.2 optimization for period-appropriate CPUs
- ~120 MB deployment package
- Tested inference at 2-8 tokens/second on era-appropriate hardware
The Journey
Stage 1: Naive Optimism (30 minutes)
Standard MinGW cross-compile. How hard could it be?
brew install mingw-w64 cmake
cmake -DCMAKE_TOOLCHAIN_FILE=... -B build
Built fine. Transferred to XP VM. Immediate crash. No error message. Just gone.
Stage 2: The Windows Version Dance (45 minutes)
The problem was obvious once I thought about it. llama.cpp targets modern Windows by default. Vista+ APIs everywhere. I needed to tell the compiler we’re targeting XP.
Created a toolchain file with `_WIN32_WINNT=0x0502`. Rebuilt. New error:
This application has failed to start because api-ms-win-crt-heap-l1-1-0.dll
was not found.
Progress. At least it was talking to me now.
Stage 3: The UCRT Rabbit Hole (2 hours)
Modern MinGW links against the Universal C Runtime. UCRT didn’t ship with XP. You need to install the VC++ 2019 Redistributable (version 16.7 specifically—later versions dropped XP support).
Finding that redistributable was its own adventure. Microsoft doesn’t exactly advertise XP compatibility anymore. Eventually found it via a GitHub thread for LegacyUpdate.
Installed. Rebuilt. New error. Threading this time.
Stage 4: Threading Primitives (1.5 hours)
llama.cpp uses SRWLOCK for synchronization. Slim Reader/Writer Locks. Fast, efficient, and introduced in Vista.
XP doesn’t have them. XP has CRITICAL_SECTION and manual Event objects. Old school.
This is where Claude did the heavy lifting. My Windows internals knowledge is rusty—I haven’t written serious Win32 code since college. I described the constraint, Claude walked through the ggml threading code, and we replaced the Vista primitives with XP-compatible equivalents.
Stage 5: The HTTP Wall (1 hour)
Rebuilt. Ran. Crashed on anything network-related.
The bundled cpp-httplib (v0.28.0) explicitly checks the Windows version and refuses to run on anything below Windows 8.1. Not a bug—a deliberate choice by the maintainer.
The Wall
cpp-httplib was the real obstacle. The library handles HTTP for model downloads and the inference server. Version 0.28.0 has this check baked in. No workaround, no flag, no environment variable. It just won’t run.
I tried patching it. Commenting out the version check led to undefined symbol errors—the library uses APIs that don’t exist on XP.
I tried building without HTTP support entirely. Worked, but felt like cheating. What’s the point of running on XP if you can’t even download models?
Dead end. Two hours of my Saturday gone.
The Breakthrough
The answer was version pinning.
cpp-httplib v0.15.3 doesn’t have the Windows version check. It’s also header-only, which simplifies the build. I asked Claude to check the git history for when XP support was explicitly dropped—somewhere between 0.15 and 0.16.
curl -L https://raw.githubusercontent.com/yhirose/cpp-httplib/v0.15.3/httplib.h \
-o vendor/cpp-httplib/httplib.h
Rebuilt. Transferred. Ran.
llama_model_load: loaded meta data with 24 key-value pairs
llama_model_load: model type = 0.5B
That output. On Windows XP. In a VM on my Mac. I actually said “holy shit” out loud. The cat left the room.
Technical Highlights
The toolchain file is the core of the whole approach. This tells CMake we’re targeting XP:
set(CMAKE_SYSTEM_NAME Windows)
set(CMAKE_SYSTEM_PROCESSOR x86_64)
set(CMAKE_C_COMPILER x86_64-w64-mingw32-gcc)
set(CMAKE_CXX_COMPILER x86_64-w64-mingw32-g++)
add_compile_definitions(_WIN32_WINNT=0x0502)
add_compile_definitions(WINVER=0x0502)
set(GGML_WIN_VER "0x0502" CACHE STRING "ggml: Windows version" FORCE)
The 0x0502 is Windows XP 64-bit. For 32-bit XP, you’d use 0x0501.
CPU feature flags matter enormously. XP-era hardware doesn’t have AVX. Probably doesn’t have AVX2 or FMA either. But SSE4.2 is safe for most 64-bit CPUs from 2008 onward:
cmake -B build-xp \
-DGGML_NATIVE=OFF \
-DGGML_AVX=OFF \
-DGGML_AVX2=OFF \
-DGGML_FMA=OFF \
-DGGML_SSE42=ON
The cpp-httplib CMakeLists needed modification for the older header-only version:
add_library(${TARGET} INTERFACE)
target_include_directories(${TARGET} INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(${TARGET} INTERFACE ws2_32)
Why Windows XP?
Someone will ask: why bother?
Three reasons.
First, proof of concept. If llama.cpp can run on XP, it can run anywhere. The codebase is genuinely portable. That matters for embedded systems, for legacy environments, for the weird edge cases that real deployments encounter.
Second, retro computing is a legitimate hobby. People restore old cars. People collect vintage synthesizers. Some of us like running modern software on obsolete hardware. It’s satisfying in a way that’s hard to explain.
Third—and this is the honest answer—I needed something new to nerd out on. Sometimes you build things because building things is fun. Not everything needs a business case.
Current Status
Working:
- ✓ llama-cli (interactive and batch inference)
- ✓ llama-bench (benchmarking)
- ✓ llama-quantize (model conversion)
- ✓ All 70+ tools except the server
- ✓ GGUF model loading
- ✓ 2-8 tokens/second on Qwen2.5-0.5B
Not working:
- ✗ llama-server (requires newer httplib features)
- ✗ GPU acceleration (no CUDA/OpenCL on XP)
- ✗ Models larger than ~3B (memory constraints)
Next:
- Test on actual vintage hardware (not just VMs)
- Try 32-bit XP build
- See how far back we can go (Windows 2000? 98?)
For Other Developers
This is reproducible. Everything you need:
- macOS with Homebrew (or Linux with MinGW)
brew install mingw-w64 cmake git- Clone llama.cpp
- Create the toolchain file
- Downgrade cpp-httplib to v0.15.3
- Build with the flags documented above
- Bundle the MinGW runtime DLLs
- Install VC++ 2019 Redistributable (16.7) on XP
The full guide is in the repo. Nothing hidden. No cherry-picking.
If you try this and hit issues, open an issue. If you get it running on something weirder than XP, I want to hear about it.
Transparency
The toolchain file structure, the threading primitive replacements, debugging the httplib version compatibility, even parts of this blog post—Claude was involved in all of it. My contribution was the vision, the testing, and the willingness to spend a Saturday on something objectively pointless.
This is what actual development with AI assistance looks like. Not magic. Not “AI wrote everything.” A collaboration where I knew what I wanted to build, Claude helped me navigate the parts I didn’t know well, and we iterated until it worked.
The tools are available to everyone. The process is documented. Go build something weird.
Many thanks to the author of this Reddit thread, for getting me started.