shreyansh26
- Karma
- 11
- Created
- 8 years ago
Recent Submissions
- 1. ▲ Understanding Multi-Head Latent Attention (From DeepSeek) (shreyansh26.github.io)
- 2. ▲ Deriving the gradient for the backward pass of Layer Normalization (shreyansh26.github.io)
- 3. ▲ GTC'25 Notes: CUDA Techniques to Maximize Memory Bandwidth – Part 1 (shreyansh26.github.io)
- 4. ▲ FlashAttention in PyTorch (github.com)
- 5. ▲ Understanding FlashAttention (shreyansh26.github.io)
- 6. ▲ Ask HN: What are some good resources on Recommender Systems?