GitHub - Foundation42/galton: Toroidal Galton Sampler — a full-stack reimplementation of the Galton board as a differentiable ODE sampler: signed-distance fields, warm-start presets, auto-handoff, and diagnostics for modern transformer-style workflows.

Galton Lab

What if probability didn't have to be calculated—what if it could flow?

Galton Lab is a research playground that reimagines how neural networks make predictions. Instead of computing probability distributions the traditional way (softmax over thousands of options), we let probability flow through learned geometric landscapes—like water finding its way downhill.

🎮 Try the Interactive Demos — Learn the concepts from physics to transformers and beyond in your browser!

The Big Idea (in plain English)

Think about how a Galton board works: you drop a ball, it bounces off pegs, and eventually lands in a bucket. The pattern of where balls land creates a probability distribution through physics, not arithmetic.

We're applying this idea to machine learning:

Traditional approach:

Neural network → Calculate probabilities for ALL 50,000 words → Pick one
(Expensive, rigid, happens the same way every time)

Our approach:

Neural network → Create a probability landscape → Drop "probes" that flow toward likely tokens
(Adaptive, interpretable, uses less compute when confident)

Why This Matters

1. Adaptive Compute

When a model is very confident ("The capital of France is ___"), probes converge quickly → fast prediction. When uncertain, probes spread out → the model automatically takes more time. No manual tuning required—it emerges from the physics.

2. Built-in Uncertainty Quantification

You can literally see confidence by watching how probes move. Tight convergence = confident. Spread out = uncertain.

3. Interpretability

Instead of opaque probability numbers, you get trajectories you can visualize. You can watch probability mass flow toward the winning token and understand why it won.

4. Efficiency at Scale

No need to compute probabilities for every token in your vocabulary. The geometry guides probes to likely regions automatically.

Beyond Language Models

The core insight—probability as geometric flow—applies far beyond token prediction. Anywhere you use softmax to make categorical choices, you can replace it with learned flow fields.

Working Examples:

🖼️ Image Classification — CNN → 2D flow field → digits
🔍 Attention Mechanism — Replace softmax(QK^T) with probe routing
🎮 RL Policies — Action selection through geometric flow

See examples/ for runnable code with visualizations, and docs/use-cases.md for 8+ application domains.

The pattern is universal:

# Anywhere you see this...
logits = network(input)
probs = softmax(logits)

# You can do this instead...
context = network(input)
field = sdf(context)
probes = integrate(field)
probs = density(probes)

And get uncertainty quantification, interpretability, and adaptive compute for free.

What's in This Repository

1. Discrete Galton Boards

The intuitive starting point

Digital versions of physical Galton boards with learnable "pegs" that guide probes left or right. Simple to understand, easy to visualize, and surprisingly effective for small vocabularies.

Probes drop through a grid of learned biases
Each row nudges probes toward likely tokens
Adaptive: stops when one bucket gets enough mass
Hierarchical variants for scaling to larger vocabularies

Files: src/galton_lab/board.py, experiments/hierarchical_compare.py

2. Continuous ODE Sampler

The scalable evolution

When discrete boards hit their limits, we move to continuous flow. Probes now follow smooth trajectories on a ring (torus topology), guided by a learned velocity field.

Represents probability as a flow through continuous space
Uses ODEs (Ordinary Differential Equations) integrated with RK2
Learned using neural SDFs (Signed Distance Fields)
Scales to real vocabularies while staying differentiable

Files: src/galton_lab/ode/, galton/train.py

Supporting Infrastructure

Context composers (src/galton_lab/composers.py): Map input context → probability landscapes
Training tools (galton/train.py, tests/): GPU-ready training loops with warm-start presets
Visualizations (src/galton_lab/visualize.py): Watch probability flow in real-time
Documentation (Galton.md, docs/char32_ode_warmstart.md): Deep dives into the theory and practice

Quick Start

Installation

git clone https://github.com/Foundation42/galton.git
cd galton
python -m pip install -e ".[dev]"
pytest  # Run tests to verify everything works

Try It Yourself

See it in action (discrete boards):

# Visual demo comparing different board architectures
python experiments/hierarchical_compare.py

# Adaptive compute experiment with visualizations
python experiments/adaptive_eval.py --save-plots --write-csv

Train a model (continuous ODE sampler):

# Simple toy task (ABCD pattern prediction)
python galton/train.py --task abcd --device auto --amp \
  --per-example-fields --batch 8192

# Character-level language model
python galton/train.py --task char32 --device auto --amp \
  --sampler ode --batch 4096 --warm-start-preset char32

Key Training Flags

Flag	What it does
`--device auto`	Use GPU if available, else CPU
`--amp`	Mixed precision training (faster on GPU)
`--sampler ode`	Use continuous flow instead of discrete board
`--warm-start-preset char32`	Use proven initialization for character models
`--auto-handoff`	Automatically transition from warm-start to sharpening phase
`--compile`	JIT compile with PyTorch 2.0+ (even faster)

How It Works (For the Curious)

From Discrete to Continuous

Discrete Galton Board:

Input: "The cat sat on the"
  ↓
[Configure peg biases based on context]
  ↓
[Drop N probe particles]
  ↓
Each probe bounces through rows:
  - Read peg bias at current position
  - Add noise
  - Move left or right
  - Repeat for each row
  ↓
[Count probes in each token bucket]
  ↓
Output: "mat" (bucket with most probes wins)

Continuous ODE Sampler:

Input: "The cat sat on the"
  ↓
[Neural network creates a velocity field on a ring]
  ↓
[Integrate probe trajectories using ODEs]
  - Probes follow smooth curves
  - Velocity field guides them toward likely tokens
  - Integration uses RK2 (Runge-Kutta 2nd order)
  ↓
[Soft bucket assignment using Gaussian windows]
  ↓
Output: "mat" (highest probability mass)

Training Strategy

Warm Start: Begin with soft, spread-out probability landscapes
- High sigma (σ=0.9) for wide Gaussian windows
- Directional bias to break symmetry
- Knowledge distillation from a simple "teacher" model
Auto Handoff: System detects when model is confident
- Monitors margin (gap between top choices)
- Checks if target token probability is sufficient
Sharpening: Tighten the focus
- Reduce sigma (σ=0.5) for narrower peaks
- Remove training wheels (bias, distillation)
- Pure cross-entropy optimization

Where This Could Go

Near-Term Research Directions

Scale to production vocabularies (10k-50k tokens) using hierarchical routing
Integrate with real transformers as a drop-in softmax replacement
Stochastic variants (SDEs) for better exploration during training
Comparative benchmarks against standard sampling methods

Potential Applications

Language models with built-in uncertainty quantification
Reinforcement learning with interpretable policy flows
Structured generation where grammar rules shape the probability landscape
Any domain with periodic structure (audio, time series, molecular conformations)

Fundamental Questions

Can geometric flow matching replace all categorical distributions?
Does this connect to diffusion models, optimal transport, or energy-based learning?
Can we prove convergence guarantees for the adaptive compute property?

Learn More

Interactive Demos — Two interactive journeys:
- Foundation Demo — Physics to transformers (6 stages)
- Beyond LLMs Demo — Image classification, attention, RL (4 stages)
examples/ — Runnable Python examples: image classification, attention, RL policies
docs/use-cases.md — 8+ application domains beyond language models
Galton.md — The complete origin story: from 4am idea to working prototype
docs/char32_ode_warmstart.md — Deep technical dive on the continuous ODE sampler
experiments/ — Additional experiments with visualizations
tests/ — Unit tests that double as usage examples

Repository Structure

galton/
├── src/galton_lab/          # Core library
│   ├── board.py             # Discrete Galton board logic
│   ├── ode/                 # Continuous flow sampler (SDFs, integration)
│   ├── composers.py         # Context → probability landscape mapping
│   ├── torch_modules.py     # PyTorch-wrapped samplers
│   └── visualize.py         # Plotting and diagnostics
├── galton/train.py          # Main training script with presets
├── experiments/             # Standalone demos and analyses
├── tests/                   # Geometry invariants and regression tests
├── docs/                    # Technical documentation
└── sketches/                # Early prototypes and explorations

Contributing

This is an active research project. We welcome:

Experiments — Try it on new tasks and share results
Visualizations — Make the flow more intuitive
Theory — Connect to related mathematical frameworks
Critique — Tell us where this breaks or why it won't scale

Open an issue to discuss ideas or submit a PR with improvements.

Citation

If you build on this work, please cite:

@software{galton_lab_2025,
  author = {Christian Beaumont and Anthropic Claude (Chat and Code) and DeepSeek and OpenAI (GPT-4o and Codex)},
  title = {Galton Lab: Probability Sampling Through Learned Flow Fields},
  year = {2025},
  url = {https://github.com/Foundation42/galton}
}

License

MIT — See LICENSE file for details.

"In a world full of edges, be a torus."