NVIDIA Vera Rubin NVL72

Massive Efficiency Gains in AI Training and Inference

Boosting Training Efficiency

NVIDIA Rubin trains mixture-of-expert (MoE) models with one-fourth the number of GPUs over the NVIDIA Blackwell architecture.

Projected performance subject to change. Number of GPUs based on a 10T MoE model trained on 100T tokens in a fixed timeframe of 1 month.

LLM inference performance subject to change. Cost per 1 million tokens based on Kimi-K2-Thinking model using 32K/8K ISL/OSL comparing Blackwell NVL72 and Rubin NVL72.

Driving Down Inference Costs

NVIDIA Rubin delivers one-tenth the cost per million tokens compared to NVIDIA Blackwell for highly interactive, deep reasoning agentic AI.