New deepseek paper: Natively Trainable Sparse Attention mechanism

5 points by redlock a year ago · 2 comments

Reader

eunos a year ago

Authored and Uploaded by none others than Liang Wenfeng himself