Guide to the Mamba architecture that claims to be a replacement for Transformers

5 points by gregschoeninger 3 years ago · 2 comments

Reader

gregschoeningerOP 3 years ago

Been diving deep into the Mamba paper and put together my notes here:

Took me awhile to wrap my head around some of the terminology, so hopefully this helps anyone else trying to grok it.

andy99 3 years ago

You may also be interested in https://github.com/rbitr/llm.f90/tree/master/ssm it's my inference only implementation of mamba which ends up being much simpler than the training code in the original repo

Settings