Skip to content
Back
Theme
Verified Deep Learning with Lean 4
3 min read
Original article ↗
1
Introduction
▶
What this book is about
The thesis
Who this book is for
Why now
Why image recognition
Why Lean
Let’s go
2
You Are Here
▶
2.1
Example: MNIST linear classifier
2.2
MLIR: Linear
2.3
What’s inside
.train
?
2.4
MLIR: Training Step
2.5
How this book is organized
3
MNIST: 1D MLP
▶
The pdiv we built last chapter, now applied
The Jacobian as multidimensional “how does it jiggle?”
The dense layer’s Jacobian is the weight matrix
ReLU: the piecewise case
Softmax cross-entropy: where vectors collapse to a scalar
From Jacobians to VJPs
The theorems
3.1
Example: MNIST MLP
3.2
MLIR: Dense
4
MNIST: 2D CNN
▶
The MLP wastes information; we can do better
Three properties baked into a convolution
The Jacobian is sparse and structured
The backward of a convolution is another convolution
Max pooling: the winner gets the gradient
The rest of the network is Chapter 3
The theorems
4.1
Example: MNIST 2D CNN
4.2
MLIR: Convolution
5
CIFAR with BatchNorm
▶
What BN actually is
The centering term has an indirect path
The inverse-stddev term
The three-term backward
The affine step is just dense-with-broadcasting
Putting it together
The theorems
5.1
Example: the BN lift on CIFAR
5.2
MLIR: BatchNorm
6
ResNet-34
▶
Deeper networks used to stop training
The residual block: ask for the difference, not the function
Why the proof is a one-liner
ResNet-34 is sixteen of these blocks, stacked
The theorem
6.1
Example: ResNet-34 on Imagenette
6.2
MLIR: Residual
6.3
What’s in the production recipe?
6.4
Ablation: what each ingredient contributes
6.5
ImageNet recipe
A reasonable stopping point
7
MobileNetV2
▶
A standard conv does two jobs at once
Depthwise: one kernel per channel, no cross-channel sum
The depthwise-separable sandwich
Inverted residual: depthwise at the wide point
Stacking the blocks
The theorems
7.1
Example: MobileNet V2 on Imagenette
7.2
MLIR: Depthwise Convolution
7.3
ImageNet recipe
8
EfficientNet
▶
A convolution treats every channel equally
Three steps, one tensor product
From the framework: it’s an elementwise product
Attention in Disguise
EfficientNet-B0: MobileNet-V2 with SE everywhere
The theorem
8.1
Example: EfficientNet-B0 on Imagenette
8.2
MLIR: Squeeze-and-Excitation
8.3
ImageNet recipe
8.4
Side quests
9
ConvNeXt
▶
ConvNets strike back
Two new primitives
The ConvNeXt block
ConvNeXt-T is four stages, ResNet-style
The theorems
9.1
Example: ConvNeXt-T on Imagenette
9.2
MLIR: Layer Scale
9.3
Data Augmentation
9.4
ImageNet recipe
10
Vision Transformer
▶
Where Part 1 has been heading
What attention is, in one screen
Why this chapter has thirty theorems
What’s actually proved
10.1
Matrix-level machinery
10.2
Attention proofs
10.3
Example: ViT-Tiny on Imagenette
10.4
MLIR: Attention
10.5
Data Augmentation
10.6
ImageNet recipe
11
Bestiary of Architectures
▶
What “\(N\) new primitives” means.
11.1
Bestiary-only
Layer
primitives
11.2
Bestiary entries
A
Getting started
▶
Track 1: No-GPU Docker demo
Track 2: Native install (CUDA or ROCm)
Track 3: Proofs only (no IREE needed)
Common troubleshooting
B
On Verification
▶
B.1
Trust kernel
B.2
Finite-difference gradient checks
B.3
The JAX parallel pipeline
B.4
Independent kernel re-check (comparator)
B.5
Verified code generation
B.6
Inside a bridge theorem
B.7
Float32, and whether it still trains
B.8
The five layers, as three kinds of certainty
Acknowledgments
Colophon
Dependency graph