Show HN: Sparse Matrix-Vector Multiplication that works at 30–90% sparsity

7 points by vlejd 2 months ago · 6 comments · 1 min read

Reader

To get benefits from sparsity, you usually need to have very sparse matrices, impose some structure on the sparsity pattern or have specialized hardware. None of it is the case if you want to rune pruned LLMs on consumer devices. I wanted to see how far can you push it on a GPU and ended up with this. Blog: https://www.grizzlytech.dev/blog/macko-spmv Paper: https://arxiv.org/abs/2511.13061 Code (example with torch): https://github.com/vlejd/macko_spmv

telmop 2 months ago

Cool method. Pre deep learning there was plenty of interesting research on sparse methods. What do you think we're missing to have more widely used neural+sparse approaches?

vlejdOP 2 months ago

I think the lack of efficient GPU kernels was the main problem. It is much, much easier to get a real speedup and memory reduction from quantization from fp16 to fp8 than from 50% sparsity. For sparsity you needed structure (which makes your model worse) and special hardware support.

jjgreen 2 months ago

Interesting approach -- thanks

Settings

Show HN: Sparse Matrix-Vector Multiplication that works at 30–90% sparsity

Keyboard Shortcuts