Settings

Theme

Brumby-14B-Base: The Strongest Attention-Free Base Model

manifestai.com

7 points by cgel 2 months ago · 1 comment

Reader

cgelOP 2 months ago

We have trained a completely attention-free LLM whose performance is competitive with state-of-the-art models. This model, which we call Brumby-14B-Base, has a familiar Transformer-style architecture, except it uses power retention layers instead of attention layers. It is available on Huggingface.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection