GitHub - anupamchugh/shadowbook: Beads - A memory upgrade for your coding agent

8 min read Original article ↗

Shadowbook

Pacman Score

bd — keep the story straight, even when the work isn't

$ bd pacman

╭──────────────────────────────────────────────────────────╮
│  ᗧ····○ bd-abc····○ bd-xyz····○ bd-123 ····◐            │
╰──────────────────────────────────────────────────────────╯

YOU: claude | SCORE: 3 dots | #1 codex (5 pts)
$ bd recent --all

test-f2y [P1] Implement OAuth login  ● volatile  ○ open  just now
└─ ● specs/auth.md  ✓ active  ● volatile  just now
test-sgo [P3] Update README  ○ stable  ○ open  just now
└─ ● specs/docs.md  ✓ active  ○ stable  1m ago

Summary: 2 beads, 2 specs | Active: 2 pending | Momentum: 4 items today

One command. Beads, specs, skills—nested by relationship. Drift called out. No guesswork.

License Go Report Card

Built on beads.


The Formula‑1 Story

Shadowbook is race control for agentic engineering.

Specs are the track. Beads are the cars. Skills are the pit crew. Wobble is tire degradation. Volatility is track instability. Drift is when the car runs a different line than the one you designed.

Shadowbook keeps the race safe:

  • It flags when the track is changing while cars are already at speed.
  • It shows which cars are on worn tires (unstable skills) and which are safe to push.
  • It pauses risky runs when the track is breaking apart.
  • It gives you a clean lap chart of what’s actually happening, not what you hoped happened.

In Formula‑1 terms: Shadowbook is the difference between “full send” and a DNF you didn’t see coming.


Five Drifts, One Tool

Drift Problem Solution
Spec Drift Spec changes, code builds old version bd spec scan
Skill Drift Skills diverge or collide across environments bd preflight --check, bd skills collisions
Visibility Drift Can't see what's active bd recent --all
Stability Drift Specs churning while work in flight bd spec volatility
Behavioral Drift Claude "helpfully" deviates from instructions bd wobble scan

Quick Start

curl -fsSL https://raw.githubusercontent.com/anupamchugh/shadowbook/main/scripts/install.sh | bash
cd your-project && bd init && mkdir -p specs
bd recent --all

Snap Streaks

Track spec stability over time. Like Snapchat streaks, but for specs.

$ bd spec volatility --trend specs/auth.md

  Week 1: ████████░░  8 changes
  Week 2: █████░░░░░  5 changes
  Week 3: ██░░░░░░░░  2 changes
  Week 4: ░░░░░░░░░░  0 changes

Status: DECREASING
Prediction: Safe to resume work in ~5 days

Declining = stabilizing. Flat at zero = locked down. Increasing = chaos growing.

Badges everywhere:

$ bd list --show-volatility
  bd-42  [● volatile] Implement login    in_progress
  bd-44  [○ stable]    Update README     pending

$ bd ready
○ Ready (stable): 1. Update README
● Caution (volatile): 1. Implement login (5 changes/30d, 3 open)

Cascade impact:

$ bd spec volatility --with-dependents specs/auth.md

specs/auth.md (● HIGH: 5 changes, 3 open)
├── bd-42: Implement login ← DRIFTED
│   └── bd-43: Add 2FA (blocked)
└── bd-44: RBAC redesign

Impact: 3 issues at risk
Recommendation: STABILIZE

CI gate:

bd spec volatility --fail-on-high  # Exit 1 if HIGH volatility

Auto-pause:

bd config set volatility.auto_pause true
bd resume --spec specs/auth.md  # Unblock after stabilization

Spec Drift Detection

bd create "Implement login" --spec-id specs/login.md
# ... spec changes ...
bd spec scan
● SPEC CHANGED: specs/login.md → bd-a1b2 unaware

bd list --spec-changed    # Find drifted issues
bd update bd-a1b2 --ack-spec  # Acknowledge

Spec Radar Flow

Treat it like a daily weather report for specs.

# Morning: see what moved
bd spec delta

# Midday: clean up ideas
bd spec triage --sort status

# Weekly: generate a briefing
bd spec report --out .beads/reports

# Cleanup day: align lifecycle with reality (confirm before apply)
bd spec sync --apply

Quick reads:

  • bd spec stale shows age buckets.
  • bd spec duplicates surfaces overlap.
  • bd spec report combines summary, triage, staleness, duplicates, delta, and volatility.

Skill Sync

bd preflight --check
✓ Skills: 47/47 synced
✓ Specs: 12 tracked
● Volatility: 2 specs have high churn

bd preflight --check --auto-sync  # Fix drift

Wobble: Measure the Drift

     You write the recipe. Claude edits it.

     Expected:  bd list --created-after=$(date -v-1d) --sort=created
     Actual:    bd list --status=in_progress  ← "I thought this would help"

                    ᗧ····~····~····~····
                         wobble →

Based on Anthropic's "Hot Mess of AI" paper: extended reasoning amplifies incoherence. Wobble catches it.

$ bd wobble scan --from-sessions --days 7

┌─ WOBBLE SCAN: REAL SESSION DATA ───────────────────────┐
│ Analyzed 18 skills with REAL session data             │
└────────────────────────────────────────────────────────┘

┌─ WOBBLE REPORT: my-skill (REAL DATA) ──────────────────┐
│ Invocations: 6                                         │
│ Exact Match Rate: 33%                                  │
│ Variants Found: 5                                      │
│ Wobble Score: 0.85                                     │
│                                                        │
│ VERDICT: ● UNSTABLE                                    │
└────────────────────────────────────────────────────────┘

The formula (from the paper):

Wobble = Variance / (Bias² + Variance)

High wobble = Claude does something different every time
High bias   = Claude consistently does the wrong thing

Structural risk factors that predict high wobble:

  • No EXECUTE NOW section with explicit command
  • Multiple options without (default) marker
  • Content > 4000 chars (Claude overthinks)
  • Missing "DO NOT IMPROVISE" constraint
  • Numbered steps without clear default

Two modes:

# Simulated analysis (fast, no history needed)
bd wobble scan my-skill

# Real session analysis (parses actual Claude behavior)
bd wobble scan --from-sessions --days 14

# Rank all skills by risk
bd wobble scan --all --top 10

# Project health audit
bd wobble inspect . --fix

Drift dashboard:

Shows last wobble scan, stable/wobbly/unstable counts, skills fixed since last scan, and spec/bead drift summary.

Cascade impact:

Lists known dependents from the wobble store (.beads/wobble/skills.json).

Fixing wobbly skills:

## EXECUTE NOW

**Run this immediately:**
```bash
your-exact-command --with-flags

Do NOT improvise. Run the command above first.


---

## Auto-Compaction

```bash
bd spec candidates        # Score specs for archival
bd spec compact specs/old.md --summary "Done. 3 endpoints."
bd close bd-xyz --compact-spec --compact-skills

Commands

Command Action
bd recent --all Activity dashboard with volatility
bd ready Work queue, partitioned by volatility
bd ready --mine Work queue filtered to your assignments
bd list --show-volatility Badges: ● volatile / ○ stable
bd spec scan Detect spec changes
bd spec stale Show specs by staleness bucket
bd spec triage Triage specs/ideas by age and git status
bd spec duplicates Find duplicate or overlapping specs
bd spec delta Show spec changes since last scan
bd spec report Generate full spec radar report
bd spec align Spec ↔ bead ↔ code alignment report
bd spec sync Sync spec lifecycle from linked beads
bd spec volatility List specs by stability
bd spec volatility --trend <spec> 4-week visual trend
bd spec volatility --with-dependents <spec> Cascade impact
bd spec volatility --recommendations Action items
bd spec volatility --fail-on-high CI gate
bd preflight --check Skills + specs + volatility
bd resume --spec <path> Unblock paused issues
bd assign <id> --to <agent> Assign a bead to someone
bd wobble scan <skill> Analyze skill for drift risk
bd wobble scan --all Rank all skills by wobble risk
bd wobble scan --from-sessions Use REAL session data
bd wobble inspect . Project skill health audit
bd drift Wobble + spec/bead drift summary
bd cascade <skill> Wobble cascade impact from stored dependents
bd pacman Pacman mode: dots (ready work), blockers, leaderboard
bd pacman --pause "reason" Pause signal for other agents (file-based)
bd pacman --resume Clear pause signal
bd pacman --join Register agent in .beads/agents.json
bd pacman --eat <id> Close task + increment score (hidden flag)
bd pacman --global Workspace-wide view across all projects
bd pacman --badge Generate GitHub profile badge

Pacman Mode (Multi-Agent)

Gamified task management for coordinating multiple agents. No server required.

$ bd pacman

╭──────────────────────────────────────────────────────────╮
│  ᗧ····○ bd-abc····○ bd-xyz····○ bd-123 ····◐            │
╰──────────────────────────────────────────────────────────╯

YOU: claude
SCORE: 3 dots

DOTS NEARBY:
  ○ bd-abc ● P1 "Implement login flow"
  ○ bd-xyz ● P2 "Add retry logic"

ACHIEVEMENTS:
  ✓ First Blood
  ✓ Streak 5
  ✓ Ghost Buster

Tip: `bd pacman --global` aggregates dots and scores across your workspace.

BLOCKERS:
  ● bd-456 blocked by bd-789

LEADERBOARD:
  #1 codex   5 pts
  #2 claude  3 pts

All tasks done? Pacman clears the maze:

╭──────────────────────────────────────────────────────────╮
│  ᗧ····················✓ CLEAR!                            │
╰──────────────────────────────────────────────────────────╯

Multi-Agent Scenarios

Two agents, same project:

# Codex joins and works
AGENT_NAME=codex bd pacman --join
bd pacman --eat bd-123              # Close + score

# You check progress
bd pacman                           # See leaderboard

Session handoff (day → night):

# End of day
git push

# Codex overnight
git pull && AGENT_NAME=codex bd pacman --join
bd pacman --eat bd-456
git push

# Next morning
git pull && bd pacman               # See overnight work

Emergency stop all agents:

bd pacman --pause "PRODUCTION DOWN"
# Every agent's next bd command shows warning

bd pacman --resume                  # After incident

Workspace-Wide View

$ bd pacman --global

╭──────────────────────────────────────────────────────────╮
│  GLOBAL PACMAN · 5 projects · 42 dots · 8 ghosts        │
╰──────────────────────────────────────────────────────────╯

YOU: claude
TOTAL SCORE: 15 dots across all projects

PROJECTS:
  18○ project-alpha              (5 pts) ◐3
  12○ project-beta               (3 pts) ◐5
  8○  api-backend                (2 pts)
  4○  mobile-app                 (5 pts)
  ✓   my-tool                    (10 pts)

Files (All Git-Tracked)

.beads/
├── agents.json       # Who's playing
├── scoreboard.json   # Points per agent
└── pause.json        # Pause signal (when active)

Why Files, Not Server?

Aspect Server Files
Agent dies Inbox stuck Files persist
10 projects 10 registrations 0 registrations
Sync MCP calls Git pull/push

Documentation


Why "Shadowbook"?

Every spec casts a shadow over code. When the spec moves, the shadow should move too.


MIT License · Built on beads

Wobble Drift

bd drift
bd cascade <skill>

Drift shows the last wobble scan summary plus spec/bead drift counts. Cascade prints the dependents recorded in .beads/wobble/skills.json.