Settings

Theme

Towards understanding multiple attention sinks in LLMs

github.com

1 points by thw20 3 months ago · 4 comments

Reader

thw20OP 3 months ago

This project reveals an interesting phenomena, where LLM converts semantic non-informative tokens to attention sinks through middle layer MLP.

The converted sinks are termed secondary attention sinks as they are weaker then BOS attention sinks.

This might be related to layer specialisation in LLM!

thw20OP 3 months ago

The up to date paper documenting and analysing the observation is now available on arxiv!

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection