Jagged Flash Attention Optimization
shaped.aiFlash attention natively supports packing multiple variable length sequences into a single call, what is the advantage of jagged flash attention?
If only there was a link to a page somewhere that could answer this question for you.