π Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. β‘ Up to 1350+ FP8 TFLOPS on Hopper GPUs β No heavy dependency, as clean as a tutorial β Fully Just-In-Time compiled β Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes β Supports dense layout and two MoE layouts π GitHub: github.com/deepseek-ai/Deβ¦
Post
Post
Don't miss what's happening
People on X are the first to know.
