๐ Local GRPO Training
This is a refactored local version of the Unsloth Colab notebook, based on the excellent work by Daniel Han and the Unsloth team.
Now you can run GRPO policy locally and feel the AHA MOMENT on your own machine! โจ
๐ Sources
- ๐ Original Colab notebook by Daniel Han: LinkedIn Post
- ๐ง Reasoning model guidance from Unsloth's blog post
- ๐ฏ Reward model from Will's Gist
๐ ๏ธ Prerequisites
- ๐ฅ๏ธ GPU (NVIDIA)
- ๐ง make (optional - see Advanced Instructions if not using make)
๐โโ๏ธ Quick Start
โ๏ธ Configuration
Modify config.yaml to customize settings and parameters. Then simply run:
๐งน Clean up
โ ๏ธ Limitations
- ๐ฎ Currently supports single GPU operations only
- ๐ช For multi-GPU or H100 access, please visit runpod.io
๐ Advanced Instructions
If you prefer not to use make, you can run the Docker commands directly:
# ๐๏ธ Build the image docker build -t grpo_unsloth . # ๐ฆ Create container docker create -it \ --gpus=all \ --name grpo_unsloth_container \ -v $(pwd)/models:/models \ -v $(pwd):/workspace \ -e HF_HOME=/models/cache \ grpo_unsloth # ๐ Start container docker start grpo_unsloth_container # ๐งช Run a quick test (dry run) docker exec -it grpo_unsloth_container bash -c "uv run python main.py 'saving=null' 'training.max_steps=10'" # ๐ Run full training docker exec -it grpo_unsloth_container bash -c "uv run python main.py 'saving=null'" # โน๏ธ Stop container docker stop grpo_unsloth_container # ๐๏ธ Remove container docker rm grpo_unsloth_container
๐ค Contributing
Feel free to open issues and pull requests!
๐ License
This project is open-source and available under the MIT License.