Enterprise Reference Architectures
Build AI Factories That Scale
Turn your data center into a high-performance AI factory with NVIDIA Enterprise Reference Architectures.
The Building Blocks for AI Success
NVIDIA Enterprise Reference Architectures (Enterprise RAs) enable organizations to design, deploy, and scale high-performance AI factories using validated, repeatable infrastructure. These designs combine certified compute, high-speed east-west and north-south networking, observability tools, and software to ensure scalable performance, from four-node clusters to enterprise-scale environments.
Enterprise Reference Architectures
Your Guide to the Complete Family
A comprehensive suite of instructions for setting up clusters in the data center is now available.
Designed for Every Use Case
Accelerate agentic AI, physical AI, high-performance computing (HPC), and AI simulation workloads with proven NVIDIA Enterprise Reference Architectures and NVIDIA-Certified Systems from global partners. The primary infrastructure cluster configurations for deploying enterprise AI factories are outlined below.
-
NVIDIA RTX PRO AI Factory
-
NVIDIA HGX AI Factory
-
NVIDIA NVL72 AI Factory
NVIDIA RTX PRO AI Factory
The NVIDIA RTX PRO™ AI Factory configuration is designed for a broad spectrum of enterprise workloads, including generative and agentic AI, data analytics, visual computing, and engineering simulation. Deployments are optimized around 16- and 32-node design points, providing an ideal balance of performance, scalability, and deployment efficiency. Designed for universal workload acceleration across enterprise AI, simulation, and visual computing, NVIDIA RTX PRO Servers are optimized for PCIe environments, making them ideal for space-, power-, and cooling-constrained data centers. Purpose-built for modern AI workloads, they deliver efficient performance for agentic AI and large language model (LLM) inference.
NVIDIA HGX AI Factory
The high-performance NVIDIA HGX™ AI Factory configuration is purpose-built for multi-node AI training and inference at scale, leveraging NVIDIA HGX systems. Available in 32-, 64-, and 128-node design points and supported by NVIDIA Spectrum-X™ networking, the architecture features a flexible, rail-optimized design that enables efficient integration across diverse rack layouts while delivering high-throughput, low-latency performance. It provides breakthrough performance for AI power users running the most demanding workloads, enables large-scale model training and fine-tuning, and dramatically accelerates inference. With next-generation precision and ultra-fast interconnects, the solution achieves up to 15x higher token throughput.
NVIDIA NVL72 AI Factory
The NVIDIA NVL72 AI Factory configuration is designed to train and deploy trillion-parameter models, delivering exascale computing power within a single rack. Built for massive model throughput, multi-user inference, and real-time inference at scale, it enables the next generation of AI-driven innovation. Deployment design points center on four- and eight-rack configurations. Built on a flexible, rail-optimized network, the architecture adapts to diverse rack layouts and system designs while delivering high-bandwidth, low-latency performance. The platform delivers exceptional AI factory output with industry-leading energy efficiency and is powered by fifth-generation NVIDIA NVLink™, FP4 Tensor Cores, and advanced thermal innovations.
The Strategic Value of Enterprise RAs
Unlock scalable, high-performance AI infrastructure with proven, partner-ready configurations.
Peak Performance for AI Workloads
Meet the intensive demands of AI inference, fine-tuning, and training with architectures that ensure full GPU utilization and performance consistency across multi-node clusters.
Flexible Scaling, Simplified Operations
Easily expand your infrastructure and ensure scalable, streamlined deployment for up to 128 nodes. Build the foundation for full-stack solutions with the NVIDIA Enterprise AI Factory validated design, which leverages our software ecosystem.
Reduce Complexity and TCO
Simplify deployment processes and efficient designs, reduce complexity and total cost of ownership (TCO), while reducing time to value.
Supportability
Follow specific, standardized design patterns to achieve consistent operation from one installation to the next, reduce the need for frequent support, and enable faster resolution times.
Partnered for Performance
We’re proud to collaborate with leading partners as they bring Enterprise Reference Architectures and AI factory solutions to market. Endorsed designs from these partners have passed our Design Review Board, offering guidance that earns our endorsement in one or more of the following categories: infrastructure, networking logic, and software.
Palantir Sovereign AI OS Reference Architecture With NVIDIA
The Palantir Sovereign AI OS Reference Architecture is based on NVIDIA Enterprise RAs, tested and qualified to run Palantir's complete software suite on NVIDIA AI infrastructure with our global system partners. This sovereign AI architecture is critical for customers with latency-sensitive workflows, data sovereignty requirements, and high geographic distribution. The architecture provides enterprises with total control over their data, AI models, and applications.
Learn More About Enterprise RAs
NVIDIA’s AI Factory Drives Enterprise Innovation at Scale
NVIDIA built a unified AI factory to scale generative AI and agentic workflows across the enterprise, ensuring security, performance, and consistency. The platform supports hundreds of AI agents that accelerate innovation, streamline software and hardware engineering, and optimize supply chain operations—reducing planning times by over 95 percent and achieving decades’ worth of engineering work in just one year.
NVIDIA Blackwell Ultra Delivers up to 50x Better Performance and 35x Lower Cost for Agentic AI
Built to accelerate the next generation of agentic AI, NVIDIA Blackwell Ultra delivers breakthrough inference performance with dramatically lower cost. Cloud providers such as Microsoft, CoreWeave, and Oracle Cloud Infrastructure are deploying NVIDIA GB300 NVL72 systems at scale for low-latency and long-context use cases, such as agentic coding and coding assistants.
This is enabled by deep co-design across NVIDIA Blackwell, NVLink™, and NVLink Switch for scale-out; NVFP4 for low-precision accuracy; and NVIDIA Dynamo and TensorRT™ LLM for speed and flexibility—as well as development with community frameworks SGLang, vLLM, and more.
Next Steps
Cluster Configuration 2-8-5-200 Specs
Cluster Configuration 2-8-9-400 Specs