tinygym reimplements flashrl, while using tinygrad instead of torch
🛠️ pip install tinygym or clone the repo & pip install -r requirements.txt
- If cloned (or if envs changed), compile:
python setup.py build_ext --inplace
The README of flashrl is mostly valid for tinygym, with the biggest difference being:
tinygymis not fast (yet) -> Learns Pong in ~5 minutes instead of 5 seconds (on a RTX 3090)
Just like in flashrl, python train.py should look like this (with the progress bar moving ~60x slower):
Check out the onefile branch, if you want to make it fast(=try to make TinyJit work)!
Implementation differences to flashrl
The most important difference (enabled RL after 2 hours of debugging):
- Use
.abs().clip(min_=1e-8)inppoto avoid close to zero values in(value - ret)
Without this, the optimizer step can result in NaNs and "RL doesn't work" 😜
To potentially enable tinygrad.TinyJit (does not work yet, hence the slowness)
Learnerdoes not.setup_dataandrolloutis a function (instead of aLearnermethod) that fills a list with Tensors and.stacks them at the end
Since it somehow performs better
.uniform(tinygraddefault) instead of.kaiming_uniform(torchdefault) weight initialization fornn.Linear
Custom tinygrad rewrites of torch.nn.init.orthogonal_ & torch.nn.utils.clip_grad_norm_are used
You'll find a .detach() here and a .contiguous() there, but other than that tinygym=flashrl 🤝
Acknowledgements 🙌
I want to thank
- George Hotz and the tinygrad team for commoditizing the petaflop! Star tinygrad ⭐
- Andrej Karpathy for commoditizing RL knowledge! Star pg-pong ⭐
and last but not least...

