Settings

Theme

Understanding Multi-Head Latent Attention (From DeepSeek)

shreyansh26.github.io

2 points by shreyansh26 24 days ago · 1 comment

Reader

shreyansh26OP 24 days ago

A short deep-dive on Multi-Head Latent Attention (MLA) (from DeepSeek): intuition + math, then a walk from MHA → GQA → MQA → MLA, with PyTorch code and the fusion/absorption optimizations for KV-cache efficiency.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection