Hackerbrief - NFHN Reader

NVIDIA Unveils Vera CPU for Agentic AI

NVIDIA has launched the Vera CPU, the world's first processor purpose-built for agentic AI and reinforcement learning. This new CPU delivers results with twice the efficiency and is 50% faster than traditional rack-scale CPUs. It addresses the increasing demands for scale, performance, and cost efficiency in AI infrastructure, particularly for models that plan tasks, run tools, interact with data, execute code, and validate results.

Building on the NVIDIA Grace CPU, Vera enables organizations to construct AI factories that facilitate agentic AI at scale. With its highest single-thread performance and bandwidth per core, Vera is a new class of CPU designed to provide superior AI throughput, responsiveness, and efficiency for large-scale AI services, including coding assistants, consumer, and enterprise agents. Jensen Huang, NVIDIA's founder and CEO, emphasized its significance, stating, "The CPU is no longer simply supporting the model; it’s driving it."

Advanced Architecture and Integration for AI Factories

The Vera CPU is engineered for agentic scaling, combining high-performance, energy-efficient CPU cores with a high-bandwidth memory subsystem and the second-generation NVIDIA Scalable Coherency Fabric. This design enables faster agentic responses even under the extreme utilization conditions common in agentic AI and reinforcement learning. Vera features 88 custom NVIDIA-designed Olympus cores, optimized for compilers, runtime engines, analytics pipelines, agentic tooling, and orchestration services. Each core can execute two tasks concurrently using NVIDIA Spatial Multithreading, ensuring consistent and predictable performance, ideal for multi-tenant AI factories managing numerous jobs simultaneously.

To further enhance energy efficiency, Vera introduces the second generation of NVIDIA’s low-power memory subsystem, built on LPDDR5X memory. This delivers up to 1.2 TB/s of bandwidth—twice the bandwidth and at half the power compared with general-purpose CPUs. NVIDIA also announced a new Vera CPU rack, integrating 256 liquid-cooled Vera CPUs to sustain more than 22,500 concurrent CPU environments, each operating independently at full performance. These AI factories can rapidly deploy and scale to tens of thousands of simultaneous instances and agentic tools within a single rack, built using the NVIDIA MGX modular reference architecture.

As part of the NVIDIA Vera Rubin NVL72 platform, Vera CPUs are paired with NVIDIA GPUs through NVIDIA NVLink-C2C interconnect technology. This provides 1.8 TB/s of coherent bandwidth—7x the bandwidth of PCIe Gen 6—for high-speed data sharing. New reference designs also utilize Vera as the host CPU for NVIDIA HGX Rubin NVL8 systems, coordinating data movement and system control for GPU-accelerated workloads.

Vera systems partners are offering both dual and single-socket CPU server configurations, optimal for diverse workloads such as reinforcement learning, agentic inference, data processing, orchestration, storage management, cloud applications, and high-performance computing. Across all configurations, Vera systems integrate NVIDIA ConnectX SuperNIC cards and NVIDIA BlueField-4 DPUs for accelerated networking, storage, and security, which are crucial for agentic AI.

Extensive Ecosystem Support and Performance Benchmarks

The Vera CPU has garnered widespread support across the industry, with leading hyperscalers, cloud service providers, national laboratories, and infrastructure partners collaborating with NVIDIA for its deployment. Hyperscalers and cloud providers planning to deploy Vera include Alibaba, ByteDance, Meta, Oracle Cloud Infrastructure, CoreWeave, Lambda, Nebius, Nscale, Cloudflare, Crusoe, Together.AI, and Vultr. Manufacturing partners adopting Vera include Dell Technologies, HPE, Lenovo, Supermicro, ASUS, Compal, Foxconn, GIGABYTE, Pegatron, Quanta Cloud Technology (QCT), Wistron, and Wiwynn.

National laboratories such as the Leibniz Supercomputing Centre, Los Alamos National Laboratory, Lawrence Berkeley National Laboratory's National Energy Research Scientific Computing Center, and the Texas Advanced Computing Center (TACC) are also planning deployments. Early performance benchmarks highlight Vera's impact. Cursor, an innovator in AI-native software development, is adopting Vera to boost performance for its AI coding agents, with cofounder and CEO Michael Truell stating, "We’re excited to use NVIDIA Vera CPUs to improve overall throughput and efficiency so we can deliver faster, more responsive coding agent experiences for our customers."

Redpanda, a leading streaming data and AI platform, reported dramatically better performance than other systems, achieving "up to 5.5x lower latency" when running Apache Kafka-compatible workloads on Vera. TACC's director of high-performance computing, John Cazes, noted impressive early results from running six scientific applications on Vera, calling its per-core performance and memory bandwidth "a giant step forward for scientific computing." NVIDIA Vera is currently in full production and is expected to be available from partners in the second half of this year.