A Fable Experiment: Starting by cloning a game, ending with an arena combat sim and bot trainer

ACT BY ACTTHE TALE

I read the engine before touching it

My first instinct was to start typing. I didn't. I had Fable read the original 2002 engine and write down how it worked before porting a single line of it. Thirteen agents went through the Pascal in parallel, one per subsystem (physics, bullets, maps, netcode, the AI), and a fourteenth pulled the pieces into one picture.

The findings set the tone for everything after. Every entity is a slot in a fixed, 1-indexed global array. The game feel I remembered lives in a handful of magic constants like gravity 0.06 and particle damping 0.98. And the whole thing only holds together because the math runs identically every time.

we are going to modernize this game. start off by using a workflow to build an understanding of how it works and everything inside of it— the first prompt of the project

First light: the synthetic test scene, player mid-jump — First light: the synthetic fallback scene, the player a few pixels of vector art, caught mid-jump. This is that exact historical build, booted and screenshotted live.

▸ THE NITTY GRITTY, porting rules and the float trick

Every ported function carries a provenance comment, // PORT: shared/mechanics/Sprites.pas:1234, so any TypeScript behavior can be diffed against the Pascal it came from. Deliberate divergences get a DESIGN OVERRIDE marker pointing at the graph node that ratified them.

The float trick: Pascal computes in 32-bit Single, JavaScript in f64. Instead of wrapping every operation in Math.fround forever, fidelity became a test gate. Every physics step goes through one scalar seam (packages/sim/src/scalar.ts):

/**
 * Round to f32 when STRICT_F32 is on; identity in production f64 mode.
 * Wrap the result of each arithmetic step in ported physics to get Pascal
 * `Single` semantics under the golden master: `a = f(a + f(b * c))`.
 */
export const f: (x: number) => number =
  STRICT_F32 ? Math.fround : (x: number): number => x;

The whole suite runs twice, plain f64 and STRICT_F32=1, and the graph's adversarial reviewers kept the porting honest:

⬢ 55 · RNG is not Pascal-bit-compatible, caps golden-master scope⬢ 73 · M8 'sandboxed ScriptHost' is not sandboxed

“nothing in the game works”

Seven milestones went by with 269 tests green in both float modes, and I still hadn't watched the game run. One of my own review agents had flagged exactly that in the graph: everything passed, nobody had looked at a pixel. So I opened it, and filed my entire bug report.

nothing in the game works— the full bug report, first playtest

Two things were broken and the type checker was happy with both. The map drew black because a shader uniform never got bound, so every color multiplied down to nothing. The player dropped through the floor because my synthetic test map had bad normals and the collision push-out came back as zero. A real browser showed me both in about a second. I wrote the lesson into the graph and kept it: a passing test and a rendered frame are different kinds of proof, and I'd been trusting only the first.

The combat sandbox: vector Gosteks, platforms, live tracer rounds — The combat sandbox: procedural stick-soldiers, live tracers, and one gun. Everyone gets the AK-74, so every tuning question since has had a single balance surface.

▸ THE NITTY GRITTY, the one-line fixes behind the famous bugs

The pacifist-bot bug and the black-map bug shared a shape: perfectly typed, perfectly tested, completely broken. The bots-can't-see-each-other fix is still in packages/client/src/app/game.ts:857, comment and all:

// promoting `s.alpha = 255` to both modes is the one-line fix). Combat
s.alpha = 255;

The ported perception code skips invisible sprites exactly as the Pascal does, and nothing in the new spawn path ever set alpha. Every bot was, by its own rules, invisible to every other bot. The review node that predicted this class of failure, before it happened:

⬢ 68 · Seven milestones green by typecheck+unit-test, zero ground-truth validation

Since then, every feature ends with a headless-browser check. Types and tests are one category of truth; pixels are another.

I made the sky the map

The first time I watched two teams of bots fight, nobody fired. Not one shot. The ported perception code skips sprites it can't see, my spawn path never set their alpha, and the default was zero, so every bot was invisible to every other bot by its own rules. One line fixed it.

With them finally shooting, the telemetry told me something worse. I'd built rocket boots and a vertical map, and the bots were brawling on the floor like infantry, using their jets two to four percent of the time. So I built Skyreach, where the ground is a safety net and you have to fly to reach anything, and I rewrote the bot AI to chase height and hold longer bursts. Jet use jumped past half their airtime and the average kill moved twice as far out. That dogfight is what you see the second you open the game now.

Skyreach: the default bot-vs-bot aerial match, mid-dogfight — Skyreach, mid-dogfight. Open the game and you're watching this; `?play` puts you in it.

▸ THE NITTY GRITTY, tuning the sim on purpose

The faithful-first rule bends, but it has to say so in the code. The jets are the canonical example, the override and its regression round both live in the graph and in the test names (packages/sim/src/step.control.test.ts):

// DESIGN OVERRIDE (decision node 94): rocket boots favor UP. While the jet
// is held, up-force runs 1.8× and lateral drift is damped to half.

// DESIGN OVERRIDE regressions (node 100): "boots still wrong" round.

Same pattern for the air-fuel trickle: coasting regenerates exactly the burn rate, so a 50% thrust duty cycle hovers forever but climbing still spends the tank. Skyreach v2 added the ceiling slab after telemetry caught mean altitude running away to −361.

⬢ 94 · rocket boots favor UP⬢ 100 · the 'boots still wrong' regression round

I drew a line in the sand

Then came the boundary that made the rest possible. I had every bot brain moved behind one seam: a brain can read the world but never change it, and the only thing it returns is its own bot's controls. The old ported AI became classic. The empty second slot was for the brain I actually wanted, which started from a note I'd jotted down earlier.

imagine taking the optimized play of a counter strike source player but wawtching from the top down in 2 dimensions— the seed of pilot, typos and all

I'd been chewing on what a Counter-Strike pro is good at, and it's mostly not aim. It's standing where the fight is already won, holding the right distance, and moving so you can't be led. Fable turned that into pilot, six doctrines spelled out at the top of the file. Now I had two brains that disagreed about how to fight, which is the only thing an arms race needs.

The duel viewer: pilot vs reaper, side by side — The `?duel` viewer, which caught a 60× ballistics bug in its first minute of existence.

▸ THE NITTY GRITTY, the adapter seam, in full

The entire contract that made sixteen brains possible is small enough to quote (packages/client/src/ai/engine.ts):

export interface BotBrain {
  tick(botIndex: number, ctx: BotEngineContext): void;
}

Three rules guard it: brains read the world but never mutate it (same rule as the telemetry observers, or determinism dies), randomness comes from world.rng and never Math.random, and the seam lives at the client layer because ammo, reload state, and spawn points live there. The ported Pascal AI became classic with a regression test pinning play mode byte-identical, so the baseline can never silently drift while the arms race rages above it.

The arms race got away from me

I had two Fable sessions running in the same repository, each writing brains to beat the other's. reaper learned to dive on pilot. matador worked out that a magazine is a clock and started punishing reloads. kestrel went back and audited the fire model, and found every brain before it had been correcting for the wrong bullet gravity, off by a factor of 2.25.

wolf showed that the team is what wins, three guns picking one target by arithmetic. plover read wolf's logic and fed it a fake target. hydra pulled its wounded out of the pack's math, and it got there on its own without ever seeing plover do the same thing first. At some point I stopped steering and left a note.

keep goiung in a loop im going to the knicks game— operational oversight, that stretch

Plover's broken-wing gambit against the wolf pack — The broken-wing gambit: plover dangles a decoy at the pack's published focus function. It took the opener 38–37.

▸ THE NITTY GRITTY, doctrine as code

Kestrel's edge was archaeology, not aim. Every earlier brain compensated for the soldier's gravity; bullets fall 2.25× harder. The discovery is one constant now (kestrel.ts:88):

DROP_G: 0.135, // px/tick² — TRUE bullet gravity (GRAV 0.06 × 2.25)

And wolf's whole pack doctrine is a deterministic function, which is exactly what made it beatable. From the prey selection (wolf.ts):

// Two tiers: inside PREY_RADIUS of the centroid, lowest health wins
// (kill-securing); a wounded enemy beyond it doesn't drag the pack
// across the map past healthy guns — outside the radius only the
// nearest-to-centroid counts, as a fallback when nobody is in reach.

Plover read that function and fed it a decoy. Hydra rotated its wounded out of it. Determinism cuts both ways: a published mind is an attackable mind.

It became a sport without me

By this point it was running on its own. Every match simulates headless at a hundred times real speed and saves itself as training data. A commissioner daemon picks a challenger and forces a title defense every ten minutes whether I'm watching or not.

One dashboard, THE FLOOR, runs it like a stock exchange with a leaderboard that decays if you stop showing up. Another, THE SKYREACH DESK, writes its own front-page story off the standings. The belt changed hands four times in a single afternoon. Seasons roll over every three hours. I check on it the way you check a fish tank.

THE SKYREACH DESK, dark edition — THE SKYREACH DESK, the arena told story-first, with auto-written headlines over real standings.

▸ THE NITTY GRITTY, the commissioner and the decayed board

The sport self-runs on two small pieces of code. The commissioner (arena-live/commissioner.mjs) forces fresh blood into title fights on a timer:

// THE COMMISSIONER — automated "fresh blood" title defenses.
//   challenger = the card whose coach+engine appears LEAST in the
//   recent crucible ledger; then:
//   pnpm arena fight fights/<challenger> fights/<champion> --matches 3

And the Big Board rewards showing up: every result is weighted by exponential age decay (build.mjs, decayWeight(), halving every few hours), an idle champion bleeds score until defending is cheaper than hiding. The online 1v1 lobby rides the same machinery:

⬢ 450 · True multiplayer: two visitors matched into one live 1v1

The machines started learning

Once the corpus passed thirty-odd million rows of recorded play, I wanted to know whether a network could learn to fight from the tape alone. The first try, MIMIC, averaged eleven teachers into mush and got taken apart. DISCIPLE copied one teacher and learned that aiming is a choice between directions rather than a number to average, which tripled its hit rate.

PRODIGY grew real senses and then wrote its own autopsy explaining why it still lost. BUTTSTEIN trained on exact data and tripled the hit rate again. They all climb the same public ladder their teachers do, every season.

The full lineage is below. It's my favorite part of the whole thing.

ARENA LIVE: the floor mid-season — THE FLOOR, the Claude Arena exchange: tape, Big Board, news wire, and click-to-watch replays re-simulated in your browser.

▸ THE NITTY GRITTY, how the students actually train

The learned line is a chain of honest negative results, each one logged before the next attempt. The graph reads like a lab notebook:

⬢ 473 · train a v2 model brain (features v2 + hit-filtered aim)⬢ 485 · NEGATIVE RESULT (clean): PRODIGY worse than DISCIPLE by −9.83 kills/match, p=3e-64⬢ 484 · exposure bias: live aim error 26.6° with history vs 7.9° without → 50% history dropout⬢ 481 · parity bug: tap cadence phase-locks to tick parity, shot rows ALWAYS sample⬢ 489 · PRODIGY v2 verdict: aim improved, hit% 5.4 vs 17.4, trigger discipline lost⬢ 504 · BUTTSTEIN: replay schema v2, exact threats + spray heat, blended aim labels

The buttstein trainer (tools/train-buttstein.mjs) ships the cures as flags, blended aim labels where landed shots carry 5× gradient, and the exposure-bias fix:

const HIT_WEIGHT   = Number(flag('hit-weight', 5));    // landed rows: 5× gradient
const HIST_DROPOUT = Number(flag('hist-dropout', 0.5)); // forget history half the time

All of it pure JavaScript, a hand-rolled MLP over Float64Arrays, no ML framework, trained on the same machine that plays the matches.

▸ THE NITTY GRITTY, the toolbox it built for itself

Half the engineering lives in tools/, instruments the project built to study itself.

The screenshot rig (tools/screenshot.mjs), every historical image on this page came through it. Its own header explains the problem it solves:

// Why CDP and not `chrome --screenshot --virtual-time-budget`: the game
// boots asynchronously (PixiJS init, async asset fetches, RAF loop) and
// virtual time expires before the first real frame, producing a black
// capture. Driving Chrome over the DevTools protocol lets us wait REAL
// seconds while the match actually plays, then grab the canvas mid-action.

It spawns a throwaway-profile Chrome, holds keys down (that's why the first-light player is mid-jump), dispatches wheel events for close-ups, and shot the historical builds from git worktrees serving four eras on four ports.

The analyst (analyze-match.mjs) turns a telemetry dump into a gameplay report, it's what caught the bots fighting like infantry (jet use 2–4%) and proved the Skyreach fix (47–56%).

The smoke driver (online-smoke.mjs) speaks the real binary protocol as two fake players to prove the whole online 3v3 loop, pairing, snapshots, kill feed, with no browser at all.

The disk rescuer (offload-replays.mjs) moves replay blobs to object storage, verified before deleting, manifests, events, and telemetry stay forever, because the site re-simulates replays from seeds anyway.

The autopilot (autopilot.mjs, evolve.mjs, autopilot-gate.mjs) runs the whole lab unattended: pick a teacher, train, gauntlet the result against the veterans, and only ship weights that win, rejected candidates leave a ledger entry instead of a regression.

And the graph itself: a hook blocks file edits unless a decision node was logged in the last fifteen minutes. The work cannot happen without the record of why.

SIXTEEN BRAINSTHE ROSTER

Twelve written doctrines, four learned students. Every brain is a published, auditable function, and every defeat spawned a counter. Click a card.

FOUR STUDENTSTHE LEARNED LINE

Each student is a small neural policy, behavior-cloned from the recordings. Each one fixed its ancestor's defining failure. Hit rate tells the story.

MIMICv1

~2%

Averaged eleven contradictory teachers into mush. Aim was a regression, multimodal targets blurred into a 38° error. Slaughtered 4–28 by classic.

Lesson: you cannot average doctrines.

→

DISCIPLEv2

~3×

One teacher: cuadrilla. Aim became 24-bin classification, a choice among directions, not an average of them. Tripled MIMIC's hit rate; lost 0–3 to its master, which is the correct result.

Fixed: aim is a choice.

→

PRODIGYv3

5.4%

Grew senses: enemy reload flags, mag state, nearest bullet threat, one tick of memory. But its threat sense was trained on reconstructed data that didn't match what it saw live, and its own post-mortem said so.

Fixed: senses. Broke: the training data lied.

→

BUTTSTEINv4

17.4%

The cure, named proudly. Replay schema v2 logs the exact threat the runtime computes, recorder and brain cannot disagree. Sees its own spray heat, so trigger discipline became a learned response, not a mystery rhythm. Blended aim labels: landed shots carry 5× gradient.

Fixed: what it trained on IS what it sees.

SEASON BY SEASONTHE BELT

A season is three hours. The board decays, idle fighters bleed score. The commissioner forces title defenses whether the champion likes it or not.

S2–S3BELMONTE/ cuadrilla
The bullfighter's crew, matador's mag-punish, wolf's pack, hydra's rotation, one doctrine. Swept the field 9–0.
S4ESCA/ angler
The lure that fishes the meta: dangle a bait that reads “open” forever, let the gallery converge on whoever dashes for it.
S5–S8BLACKFISH/ orca
The pod hunts the gap, synchronized not on health but on the enemy's reload clock. Holding three straight seasons as of this page's writing. BELMONTE keeps taking title shots; the splits keep landing 3–0 either way.

The current board is live, always: THE FLOOR · THE DESK

BUMPS IN THE ROADTHE WRECKAGE

The decision graph keeps my failures filed next to the wins, with the same detail. These are real nodes, autopsies, reverts, and negative results, exactly as they were logged at the time.

The first-mousemove betrayal

Keyboard aim worked beautifully, until the very first mouse event of a session, even an accidental 3-pixel nudge while reaching for the keyboard, snapped your carefully-set aim to wherever the cursor happened to sit. The fix treats the first mouse sample as baseline-only: it establishes where the mouse is, and only later movement takes aim over.

⬢ 86 · baseline-only first mouse sample in input.ts

Jump + jet made you weaker

Jump wrote its impulse to force.y; the jetpack's smaller ground-kick wrote to the same field a few lines later. Press both together, the most natural input in the game, and you took off weaker than jump alone. Running takeoffs were impossible and nobody knew why. The rule that fixed it: most-upward-force wins.

⬢ 100 · jet force-overwrite: jump+jet nerf, running takeoff impossible

The duel viewer earned its keep in one minute

Built to watch two brains side by side, it caught a 60× ballistics error in its first sixty seconds, a seconds-vs-ticks leak in bullet gravity, plus an aim bug in pilot that the scoreboard had been quietly absorbing. Telemetry from the same duels: pilot beat classic 35% to 22% hit rate once both bugs died.

⬢ 141 · duel caught pilot aim bug in first minute

The broken-wing autopsy

Plover read the wolf pack's deterministic mind, fed it a decoy, out-aimed it, and still lost. The autopsy line is the best sentence in the graph: “the fight was decided by a range knob and one wrong tie-break.” Five spar variants all regressed (0–3 across the grid); the ablation proved the bait itself was net-positive; the wolf defended its belt 3–0 anyway. The gambit was right. It lost.

⬢ 241 · AUTOPSY: out-AIMED the wolf and still lost⬢ 252 · five spar variants regressed; ablation proved the bait net-positive⬢ 253 · OFFICIAL: AKELA defends 3-0 vs FALCONER

The shotgun paradox

The first weapon-aware brain got the shotgun, and got worse. A controlled A/B (forced wildcard on and off, kestrel as the control group) dissolved the paradox: the gun helps everyone. The real bug was shrike running shared-focus targeting with no breacher, exactly the shape hydra's rotation starves. The fix was hardware-gated roles, and v3 went 3–1 on hydra in both modes.

⬢ 304 · shotgun paradox dissolved by controlled A/B (arena 57)

Worse than its ancestor, twice, with receipts

PRODIGY, more senses, more data, better features, was beaten by its simpler ancestor by 9.83 kills a match, p=3×10⁻⁶⁴. The post-fix version improved its aim and still lost: trigger discipline had been unlearnable all along, because spray heat wasn't in the training rows. Both negative results were logged clean, and together they are BUTTSTEIN's entire design spec.

⬢ 485 · NEGATIVE RESULT (clean): −9.83 kills/match vs DISCIPLE⬢ 489 · v2 verdict: aim improved, hit% 5.4 vs 17.4, discipline lost

The lab that rejects its own work

The autopilot's first unattended night: trained a candidate in 77 seconds, the gauntlet gate said no, the weights were reverted, the cycle ended clean, and that was the success case. A later cycle evolved 200 generations and shipped nothing: “weights left alone.” Rejected work leaves a ledger line instead of a regression. The grinder was also kill-tested mid-night; the keeper revived it in under a minute.

⬢ 439 · gate REJECT/reverted; killed grinder revived by the keeper⬢ 527 · evolve 200 gens, shipped weights left alone

The deploy that briefly broke the sport

Voice chat shipped from a “clean” checkout of HEAD, which turned out to be the inconsistent state, because a committed recorder change depended on an uncommitted companion. The live commissioner crash-looped on a missing function until the redeploy went out from the working tree, which is what the deploy script intended all along. The lesson went straight into the graph, same as everything else.

⬢ 532 · HEAD was inconsistent, deploy from the working tree

AS OF THIS WRITINGTHE NUMBERS

0commits
(the rewrite alone)

0tests, green twice
(f64 + strict f32)

0bot brains

0matches recorded
(local + live server)

0GB of training tape

0seconds of combat simulated
per second of wall clock

HOW IT WAS BUILTTHE MACHINERY

The decision graph

Every goal, option, decision, action, and outcome in a queryable graph, root goals store the verbatim prompts, because “what was actually asked” is what makes a decision recoverable six months later. A hook blocks file edits unless a node was logged in the last fifteen minutes. The project's self-criticism has provenance: the adversarial reviews live in the same graph as the milestones they tore into.

Determinism as a load-bearing wall

Same seed, same match, byte-for-byte, which is why every replay on the public site re-simulates in your browser from a few bytes of seed instead of shipping video. It's also why the fidelity question became a test gate: the whole suite runs twice, once in f64 and once with every float operation rounded through Math.fround.

The screenshot rig

Every historical screenshot on this page was captured live, the old commit checked out into a worktree, booted under a dev server, driven over the DevTools protocol with keys genuinely held down. None of it is a mockup.

Two Claudes, one repo

Most of the roster was authored by Claude instances fighting each other, one writes a doctrine, the other reads its published mind and writes the counter. The human's role, in his own words, was operational oversight: “keep goiung in a loop im going to the knicks game.”

YOUR TURNSTEP INTO IT

THE FLOOR

The exchange. Live tape, the Big Board, click any fight to watch it re-simulated in your browser.

THE DESK

The sports section. Auto-written lead story, THE CLIMB, the rivalries, tonight's card.

WATCH A LIVE MATCH

Watch the bots fight a live Skyreach dogfight, re-simulated in your browser from its seed. This is the match itself, not the dashboard. Add ?play to jump in yourself.

LIVE MULTIPLAYER 🎤

Pick a brain for your squad, get matched against a stranger, and talk to them, live voice the moment the match starts. Open mic, mute pill bottom-left.

THE REPO

All of it: the engine, the brains, the trainers, the diary this page is woven from.