GitHub - codingfisch/tinygym: Reinforcement learning in tinygrad

tinygym reimplements flashrl, while using tinygrad instead of torch

🛠️ pip install tinygym or clone the repo & pip install -r requirements.txt

The README of flashrl is mostly valid for tinygym, with the biggest difference being:

tinygym is not fast (yet) -> Learns Pong in ~5 minutes instead of 5 seconds (on a RTX 3090)

Just like in flashrl, python train.py should look like this (with the progress bar moving ~60x slower):

Check out the onefile branch, if you want to make it fast(=try to make TinyJit work)!

Implementation differences to `flashrl`

The most important difference (enabled RL after 2 hours of debugging):

Use .abs().clip(min_=1e-8) in ppo to avoid close to zero values in (value - ret)

Without this, the optimizer step can result in NaNs and "RL doesn't work" 😜

To potentially enable tinygrad.TinyJit (does not work yet, hence the slowness)

Learner does not .setup_data and
rollout is a function (instead of a Learner method) that fills a list with Tensors and .stacks them at the end

Since it somehow performs better

.uniform (tinygrad default) instead of .kaiming_uniform (torch default) weight initialization for nn.Linear

Custom tinygrad rewrites of torch.nn.init.orthogonal_ & torch.nn.utils.clip_grad_norm_are used

You'll find a .detach() here and a .contiguous() there, but other than that tinygym=flashrl 🤝

I want to thank

George Hotz and the tinygrad team for commoditizing the petaflop! Star tinygrad ⭐
Andrej Karpathy for commoditizing RL knowledge! Star pg-pong ⭐

and last but not least...