Jagged Flash Attention Optimization

24 points by tullie a year ago · 4 comments

Reader

platers a year ago

Flash attention natively supports packing multiple variable length sequences into a single call, what is the advantage of jagged flash attention?

bbstats a year ago

If only there was a link to a page somewhere that could answer this question for you.

CapsAdmin a year ago

Settings