https://t.co/M1kbqeiwcU

1 min read Original article ↗

Not All Layers Are Equal: Mixed-Precision Quantization for Weights and KV Cache on Apple Silicon

When you quantize an LLM to run locally on a Mac, the standard approach is uniform 4-bit. Every layer gets the same treatment. We wanted to know: what happens if you measure which layers actually need...