Compressing LLMs with progressive pruning and multi-objective distillation

4 points by adam_patarino 14 days ago · 3 comments

Reader

Compressing a mixture of experts model to fit on smaller hardware with a reinforcement learning approach called Self-Distillation Policy Optimization, progressive expert pruning, multi-objective knowledge distillation, speculative decoding, and custom quantization.

MikeSynnott 14 days ago

Awesome!

Settings

Compressing LLMs with progressive pruning and multi-objective distillation

Keyboard Shortcuts