Guide to the Mamba architecture that claims to be a replacement for Transformers
blog.oxen.aiBeen diving deep into the Mamba paper and put together my notes here:
https://blog.oxen.ai/mamba-linear-time-sequence-modeling-wit...
Took me awhile to wrap my head around some of the terminology, so hopefully this helps anyone else trying to grok it.
You may also be interested in https://github.com/rbitr/llm.f90/tree/master/ssm it's my inference only implementation of mamba which ends up being much simpler than the training code in the original repo