Massive Efficiency Gains in AI Training and Inference
Boosting Training Efficiency
NVIDIA Rubin trains mixture-of-expert (MoE) models with one-fourth the number of GPUs over the NVIDIA Blackwell architecture.
Projected performance subject to change. Number of GPUs based on a 10T MoE model trained on 100T tokens in a fixed timeframe of 1 month.
LLM inference performance subject to change. Cost per 1 million tokens based on Kimi-K2-Thinking model using 32K/8K ISL/OSL comparing Blackwell GB200 NVL72 and Rubin NVL72.
Driving Down Inference Costs
NVIDIA Rubin delivers one-tenth the cost per million tokens compared to NVIDIA Blackwell for highly interactive, deep reasoning agentic AI.