Settings

Theme

Mixed Precision Quantization on mlx comes with TurboQuant implementation

twitter.com

3 points by jsilence a month ago · 1 comment

Reader

jsilenceOP a month ago

User @thin_signal developed a tool for mixed precision quantization on MLX. They perfoemd a sensitivity analysis across the model layers and applied less radical quantization on the layers more sensitive and more quantization tomlayers that are more robust.

The tool, which is documented here (https://mlx-optiq.pages.dev/) also implements the recently aanounced TurboQuant KV-Cache optimization, so in total this should greatly improve the quality of locally run LLMs.

Looking forward to an OptiQ release of the Gemma 4 family.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection