Settings

Theme

Attention Residuals: Rethinking depth-wise aggregation [pdf]

github.com

12 points by salkahfi 3 days ago · 1 comment

Reader

krackers 2 days ago

In [1] I think a commenter actually speculated about a design just like this, where later layers can directly access outputs of previous layers instead of having to store it in the residual stream

[1] https://news.ycombinator.com/item?id=46362579

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection