We are creating a deep learning library from scratch (that evolved from a simple autograd engine). It is designed to demystify the inner workings of building deep learning models by exposing every mathematical detail and stripping down the abstractions shiny ML libraries (e.g. PyTorch/TensorFlow) have. This project tries to provide an opportunity to learn deep learning from first-principles. And use the hand-built library to create and train state-of-art models (such as GPT-2)).
“What I cannot create, I do not understand.” — Richard Feynman
Key Principles
- Learn By Doing: All formulas and calculations are derived in code, so you see exactly how gradients (or derivatives) are computed—no hidden black boxes!
- Learning Over Optimization: Focus on understanding the underlying mathematics and algorithms, rather than optimizing for speed or memory usage (though we can still train GPT models on a single CPU)
- PyTorch-Like API: API interface closely mirrors PyTorch for low adoption overhead
- Minimal Dependencies: User code and examples should go through
autograd.backend.xp; that alias binds tonumpyby default,mlxon macOS when available, orcupyon CUDA Linux hosts.pytorchis used for gradient correctness checks in unit tests.
Why build a deep learning library from scratch?
This project initially took inspiration from Micrograd, which was trying to build an Autograd (Wikipedia) engine from scratch for educational purposes. An autograd engine computes exact derivatives by tracking computations and applying the chain rule systematically. It enables neural networks to learn from errors and adjust parameters automatically. That's the core of deep learning. Then I started to add more features since everything seemed very straightforward after I had the initial building blocks (i.e. Tensor-level operations) implemented.
The primary motivation is to learn about neural networks from scratch and from first principles. There are many good ML libraries out there (e.g. Tensorflow, PyTorch, Scikit-learn, etc.) that are well-optimized and have a lot of features. But they often introduce lots of abstractions, which hide the underlying concepts and make it difficult to understand how they work. I believe, to better utilize those abstractions/libraries, we must first understand how everything works from the ground up. This is the guiding principle for this project. All mathematical and calculus operations are explicitly derived in the code without abstraction. Also, debugging a neural network, especially the backward() implementations of various functions (e.g. loss, and activation), offers a rewarding learning experience.
The goal is to keep the API interface as close as possible to PyTorch to reduce extra onboarding overhead and utilize it to validate correctness.
Demo/Examples
Explore the examples/ directory for real-world demonstrations of how this engine can power neural network training on various tasks:
📌 Transformers & GPT (Newly added):
Click to see all other examples
Toy Example
Click to expand
from autograd.tensor import Tensor from autograd.nn import Linear, Module from autograd.optim import SGD from autograd.backend import xp class SimpleNN(Module): def __init__(self, input_dim, output_dim): super().__init__() # A single linear layer (input_dim -> output_dim). # Mathematically: fc(x) = xW^T + b # where W is weight and b is bias. self.fc = Linear(input_dim, output_dim) def forward(self, x): # Simply compute xW^T + b without any additional activation. return self.fc(x) # Create a sample input tensor x with shape (1, 3). # 'requires_grad=True' means we want to track gradients for x. x = Tensor([[-1.0, 0.0, 2.0]], requires_grad=True) # We want the output to get close to 1.0 over time. y_true = 1.0 # Initialize the simple neural network. # This layer has a weight matrix W of shape (3, 1) and a bias of shape (1,). model = SimpleNN(input_dim=3, output_dim=1) # Use SGD with a learning rate of 0.03 optimizer = SGD(model.parameters, lr=0.03) for epoch in range(20): # Reset (zero out) all accumulated gradients before each update. optimizer.zero_grad() # --- Forward pass --- # prediction = xW^T + b y_pred = model(x) print(f"Epoch {epoch}: {y_pred}") # Define a simple mean squared error function loss = ((y_pred - y_true) ** 2).mean() # --- Backward pass --- # Ultimately we need to compute the gradient of the loss with respect to the weights # Specifically, if Loss = (pred - 1)^2, then: # dL/d(pred) = 2 * (pred - 1) # d(pred)/dW = d(xW^T + b) / dW = x^T # By chain rule, dL/dW = dL/d(pred) * d(pred)/dW = [2 * (pred - 1)] * x^T loss.backward() # --- Update weights --- optimizer.step() # See the computed gradients for the linear layer’s weight matrix: weights = model.fc.parameters["weight"].data bias = model.fc.parameters["bias"].data gradient = model.fc.parameters["weight"].grad print("[After Training] Gradients for fc weights:", gradient) print("[After Training] layer weights:", weights) print("[After Training] layer bias:", bias) assert xp.to_scalar(xp.allclose(x.data @ weights + bias, y_true))
Documentation
Check out the modules in this project in the docs website built from the docs/ directory.
Environment Setup
This repo uses uv.lock as the source of truth for dependency installation.
Use the bootstrap script for the intended setup flow:
./bootstrap.sh
source .venv/bin/activateBackend selection happens automatically. In user code, import and use autograd.backend.xp; the alias is then bound to one of these backends:
mlxis preferred when available on macOScupyis preferred on Linux when a CUDA device is detected and CuPy is installed- otherwise the repo falls back to
numpy
You can also force a backend explicitly:
AUTOGRAD_BACKEND=numpy uv run pytest AUTOGRAD_BACKEND=mlx uv run pytest AUTOGRAD_BACKEND=cupy uv run pytest
CuPy Setup
CuPy is optional and only used when a CUDA device is detected.
bootstrap.shauto-detects CUDA on Linux and syncs one of the pinned extras:cuda11,cuda12, orcuda13.- Manual installs are also available through
pyproject.tomlextras:
uv sync --extra dev --extra cuda12
- Pick exactly one CUDA extra that matches your installed CUDA major version.
Tests
Comprehensive unit tests and integration tests available in test/autograd
CI exercises both backend paths:
- MLX on
macos-latest - NumPy on
ubuntu-latest - CuPy auto-detection is available on CUDA Linux hosts, but is not covered by the current GitHub Actions matrix.
Future Work
- Expanding the autograd engine to power cutting-edge neural architectures
- Further performance tuning while maintaining clarity and educational value
- Interactive tutorials for newcomers to ML and advanced topics alike
Contributing
Contributions are welcome! If you find bugs, want to request features, or add examples, feel free to open an issue or submit a pull request.
License
MIT
