Settings

Theme

Diffusion based alternative to self attention

github.com

3 points by deepGem 5 months ago · 1 comment

Reader

deepGemOP 5 months ago

I spent a few weeks trying to build an alternative to self attention that scales memory linearly. I I got surprisingly good results. While in principle this makes a lot of sense, I am struggling to push the test accuracy above 86%.

Some of the alternatives I am about to consider:

1. Diffusion with sparse attention layers. 2. Hierarchical diffusion - next token diffusion combined with higher order chunk diffusion.

Still figuring out the code and I would love any feedback on these approaches.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection