πŸš€ Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚑ Up to 1350+ FP8 TFLOPS on Hopper GPUs βœ… No heavy dependency, as clean as a tutorial βœ… Fully Just-In-Time compiled

1 min read Original article β†—

Post

Post

user avatar

πŸš€ Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚑ Up to 1350+ FP8 TFLOPS on Hopper GPUs βœ… No heavy dependency, as clean as a tutorial βœ… Fully Just-In-Time compiled βœ… Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes βœ… Supports dense layout and two MoE layouts πŸ”— GitHub: github.com/deepseek-ai/De…

Don't miss what's happening

People on X are the first to know.