GitHub - apple/ml-ssd

Simple Self-Distillation

Embarrassingly Simple Self-Distillation Improves Code Generation

Ruixiang Zhang*, Richard He Bai*, Huangjie Zheng*, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang*

_{*Equal contribution}

✨ Overview

This repository reproduces the method from the paper:

Embarrassingly Simple Self-Distillation Improves Code Generation

The approach consists of three simple steps:

Sample solutions from a frozen model at non-unit temperature
Fine-tune on raw, unverified outputs using standard cross-entropy
Decode with a separately tuned temperature

No rewards · No verifier · No teacher · No RL

For full details, see the paper.

📰 News

[2026-04-03] 🚀 Initial release of repository
[2026-04-03] 🤗 Model checkpoints coming soon on Hugging Face
[2026-04-07] 🤗 Model checkpoints released
[2026-04-16] 🔧 Data generation pipeline released
(More updates will be added here)

🚀 Getting Started

git clone https://github.com/apple/ml-ssd.git
cd ml-ssd
uv sync --group evaluation          # for evaluation only
uv sync --group data-generation     # for data generation only
uv sync --group evaluation --group data-generation  # for both

Evaluation commands

source .venv/bin/activate
python evaluation/eval.py \
    --model <hf_model_name> \
    --tensor_parallel_size 4 \
    --max_tokens 65536 \
    --n_repeat 10 \
    --sampling_params "temperature=0.9,top_p=0.8,top_k=20" \
    --output_path ./results/

Note: The sampling parameters above are illustrative. Please refer to each model's HuggingFace model card for the recommended sampling parameters.

Data generation

source .venv/bin/activate
python data_generation/generate.py --config data_generation/config.yaml

This runs the full pipeline end-to-end: loads the dataset, generates solutions with vLLM, and post-processes into chat-template JSONL for SFT training. Edit data_generation/config.yaml to change the model, dataset, sampling temperature, etc.

🤗 Models

Model	HuggingFace
SSD-4B-Instruct	apple/SimpleSD-4B-instruct
SSD-4B-Thinking	apple/SimpleSD-4B-thinking
SSD-30B-A3B-Instruct	apple/SimpleSD-30b-a3b-instruct

📁 Repository Structure

├── data_generation/
│   ├── generate.py              # End-to-end data generation pipeline
│   ├── config.yaml              # Generation & post-processing config
│   └── templates/               # Prompt templates
├── evaluation/
│   ├── eval.py                  # CLI entry point
│   ├── benchmark.py             # LiveCodeBench v6 implementation
│   └── livecodebench_utils.py   # Code execution utilities
├── figures/
│   └── fig_teaser.png
├── pyproject.toml
└── README.md

📝 Citation

@misc{zhang2026embarrassinglysimpleselfdistillationimproves,
      title={Embarrassingly Simple Self-Distillation Improves Code Generation},
      author={Ruixiang Zhang and Richard He Bai and Huangjie Zheng and Navdeep Jaitly and Ronan Collobert and Yizhe Zhang},
      year={2026},
      eprint={2604.01193},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.01193},
}