Galton Lab
What if probability didn't have to be calculated—what if it could flow?
Galton Lab is a research playground that reimagines how neural networks make predictions. Instead of computing probability distributions the traditional way (softmax over thousands of options), we let probability flow through learned geometric landscapes—like water finding its way downhill.
🎮 Try the Interactive Demos — Learn the concepts from physics to transformers and beyond in your browser!
The Big Idea (in plain English)
Think about how a Galton board works: you drop a ball, it bounces off pegs, and eventually lands in a bucket. The pattern of where balls land creates a probability distribution through physics, not arithmetic.
We're applying this idea to machine learning:
Traditional approach:
Neural network → Calculate probabilities for ALL 50,000 words → Pick one
(Expensive, rigid, happens the same way every time)
Our approach:
Neural network → Create a probability landscape → Drop "probes" that flow toward likely tokens
(Adaptive, interpretable, uses less compute when confident)
Why This Matters
1. Adaptive Compute
When a model is very confident ("The capital of France is ___"), probes converge quickly → fast prediction. When uncertain, probes spread out → the model automatically takes more time. No manual tuning required—it emerges from the physics.
2. Built-in Uncertainty Quantification
You can literally see confidence by watching how probes move. Tight convergence = confident. Spread out = uncertain.
3. Interpretability
Instead of opaque probability numbers, you get trajectories you can visualize. You can watch probability mass flow toward the winning token and understand why it won.
4. Efficiency at Scale
No need to compute probabilities for every token in your vocabulary. The geometry guides probes to likely regions automatically.
Beyond Language Models
The core insight—probability as geometric flow—applies far beyond token prediction. Anywhere you use softmax to make categorical choices, you can replace it with learned flow fields.
Working Examples:
- 🖼️ Image Classification — CNN → 2D flow field → digits
- 🔍 Attention Mechanism — Replace softmax(QK^T) with probe routing
- 🎮 RL Policies — Action selection through geometric flow
See examples/ for runnable code with visualizations, and docs/use-cases.md for 8+ application domains.
The pattern is universal:
# Anywhere you see this... logits = network(input) probs = softmax(logits) # You can do this instead... context = network(input) field = sdf(context) probes = integrate(field) probs = density(probes)
And get uncertainty quantification, interpretability, and adaptive compute for free.
What's in This Repository
1. Discrete Galton Boards
The intuitive starting point
Digital versions of physical Galton boards with learnable "pegs" that guide probes left or right. Simple to understand, easy to visualize, and surprisingly effective for small vocabularies.
- Probes drop through a grid of learned biases
- Each row nudges probes toward likely tokens
- Adaptive: stops when one bucket gets enough mass
- Hierarchical variants for scaling to larger vocabularies
Files: src/galton_lab/board.py, experiments/hierarchical_compare.py
2. Continuous ODE Sampler
The scalable evolution
When discrete boards hit their limits, we move to continuous flow. Probes now follow smooth trajectories on a ring (torus topology), guided by a learned velocity field.
- Represents probability as a flow through continuous space
- Uses ODEs (Ordinary Differential Equations) integrated with RK2
- Learned using neural SDFs (Signed Distance Fields)
- Scales to real vocabularies while staying differentiable
Files: src/galton_lab/ode/, galton/train.py
Supporting Infrastructure
- Context composers (
src/galton_lab/composers.py): Map input context → probability landscapes - Training tools (
galton/train.py,tests/): GPU-ready training loops with warm-start presets - Visualizations (
src/galton_lab/visualize.py): Watch probability flow in real-time - Documentation (
Galton.md,docs/char32_ode_warmstart.md): Deep dives into the theory and practice
Quick Start
Installation
git clone https://github.com/Foundation42/galton.git cd galton python -m pip install -e ".[dev]" pytest # Run tests to verify everything works
Try It Yourself
See it in action (discrete boards):
# Visual demo comparing different board architectures python experiments/hierarchical_compare.py # Adaptive compute experiment with visualizations python experiments/adaptive_eval.py --save-plots --write-csv
Train a model (continuous ODE sampler):
# Simple toy task (ABCD pattern prediction) python galton/train.py --task abcd --device auto --amp \ --per-example-fields --batch 8192 # Character-level language model python galton/train.py --task char32 --device auto --amp \ --sampler ode --batch 4096 --warm-start-preset char32
Key Training Flags
| Flag | What it does |
|---|---|
--device auto |
Use GPU if available, else CPU |
--amp |
Mixed precision training (faster on GPU) |
--sampler ode |
Use continuous flow instead of discrete board |
--warm-start-preset char32 |
Use proven initialization for character models |
--auto-handoff |
Automatically transition from warm-start to sharpening phase |
--compile |
JIT compile with PyTorch 2.0+ (even faster) |
How It Works (For the Curious)
From Discrete to Continuous
Discrete Galton Board:
Input: "The cat sat on the"
↓
[Configure peg biases based on context]
↓
[Drop N probe particles]
↓
Each probe bounces through rows:
- Read peg bias at current position
- Add noise
- Move left or right
- Repeat for each row
↓
[Count probes in each token bucket]
↓
Output: "mat" (bucket with most probes wins)
Continuous ODE Sampler:
Input: "The cat sat on the"
↓
[Neural network creates a velocity field on a ring]
↓
[Integrate probe trajectories using ODEs]
- Probes follow smooth curves
- Velocity field guides them toward likely tokens
- Integration uses RK2 (Runge-Kutta 2nd order)
↓
[Soft bucket assignment using Gaussian windows]
↓
Output: "mat" (highest probability mass)
Training Strategy
-
Warm Start: Begin with soft, spread-out probability landscapes
- High sigma (σ=0.9) for wide Gaussian windows
- Directional bias to break symmetry
- Knowledge distillation from a simple "teacher" model
-
Auto Handoff: System detects when model is confident
- Monitors margin (gap between top choices)
- Checks if target token probability is sufficient
-
Sharpening: Tighten the focus
- Reduce sigma (σ=0.5) for narrower peaks
- Remove training wheels (bias, distillation)
- Pure cross-entropy optimization
Where This Could Go
Near-Term Research Directions
- Scale to production vocabularies (10k-50k tokens) using hierarchical routing
- Integrate with real transformers as a drop-in softmax replacement
- Stochastic variants (SDEs) for better exploration during training
- Comparative benchmarks against standard sampling methods
Potential Applications
- Language models with built-in uncertainty quantification
- Reinforcement learning with interpretable policy flows
- Structured generation where grammar rules shape the probability landscape
- Any domain with periodic structure (audio, time series, molecular conformations)
Fundamental Questions
- Can geometric flow matching replace all categorical distributions?
- Does this connect to diffusion models, optimal transport, or energy-based learning?
- Can we prove convergence guarantees for the adaptive compute property?
Learn More
- Interactive Demos — Two interactive journeys:
- Foundation Demo — Physics to transformers (6 stages)
- Beyond LLMs Demo — Image classification, attention, RL (4 stages)
examples/— Runnable Python examples: image classification, attention, RL policiesdocs/use-cases.md— 8+ application domains beyond language modelsGalton.md— The complete origin story: from 4am idea to working prototypedocs/char32_ode_warmstart.md— Deep technical dive on the continuous ODE samplerexperiments/— Additional experiments with visualizationstests/— Unit tests that double as usage examples
Repository Structure
galton/
├── src/galton_lab/ # Core library
│ ├── board.py # Discrete Galton board logic
│ ├── ode/ # Continuous flow sampler (SDFs, integration)
│ ├── composers.py # Context → probability landscape mapping
│ ├── torch_modules.py # PyTorch-wrapped samplers
│ └── visualize.py # Plotting and diagnostics
├── galton/train.py # Main training script with presets
├── experiments/ # Standalone demos and analyses
├── tests/ # Geometry invariants and regression tests
├── docs/ # Technical documentation
└── sketches/ # Early prototypes and explorations
Contributing
This is an active research project. We welcome:
- Experiments — Try it on new tasks and share results
- Visualizations — Make the flow more intuitive
- Theory — Connect to related mathematical frameworks
- Critique — Tell us where this breaks or why it won't scale
Open an issue to discuss ideas or submit a PR with improvements.
Citation
If you build on this work, please cite:
@software{galton_lab_2025, author = {Christian Beaumont and Anthropic Claude (Chat and Code) and DeepSeek and OpenAI (GPT-4o and Codex)}, title = {Galton Lab: Probability Sampling Through Learned Flow Fields}, year = {2025}, url = {https://github.com/Foundation42/galton} }
License
MIT — See LICENSE file for details.
"In a world full of edges, be a torus."