Settings

Theme

DeepSeek's Multi-Head Latent Attention

liorsinai.github.io

4 points by the_origami_fox 10 months ago · 1 comment

Reader

fspeech 10 months ago

Matrix absorption is unnecessary. What is needed is the order of multiplication associates towards the direction of the absorption. This and the modified Rope are needed to make the caching work.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection