Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction
mixedbread.comI can't wait until we get to 100% storage/cost/compute reduction for LLMs. Every thought you could have thought pre-conceived in high-fidelity super-resolution. Every action you could have taken predicted and simulated in advance courtesy of Openthropic and the USA Sovereign Wealth Fund.
97% is impressive, but I'm curious what the latency tradeoff looks like in production. Storage is only half the story for retrieval systems.