Mixed Precision Quantization on mlx comes with TurboQuant implementation

3 points by jsilence 3 months ago · 1 comment

Reader

User @thin_signal developed a tool for mixed precision quantization on MLX. They perfoemd a sensitivity analysis across the model layers and applied less radical quantization on the layers more sensitive and more quantization tomlayers that are more robust.

The tool, which is documented here (https://mlx-optiq.pages.dev/) also implements the recently aanounced TurboQuant KV-Cache optimization, so in total this should greatly improve the quality of locally run LLMs.

Looking forward to an OptiQ release of the Gemma 4 family.

Settings

Mixed Precision Quantization on mlx comes with TurboQuant implementation

Keyboard Shortcuts