Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

The NVIDIA AI Factory: DGX, HGX, MGX and the NVL72 Reference Systems (NVIDIA AI Series, Part 5)

DGX, HGX and MGX are not performance tiers, they are three ways to integrate the same NVIDIA GPUs. Here is how they differ and where the GB200 and GB300 NVL72 rack actually earns its 120 kW.

NVIDIA AI Series · Part 5 of 30

TL;DR

DGX, HGX and MGX are not three product tiers. They are three answers to the question of who integrates the box around the same NVIDIA silicon. HGX is the 8-GPU baseboard that OEMs build a server around. DGX is NVIDIA's own finished system on that baseboard with the software stack and support attached. MGX is the modular rack and node specification that lets a vendor mix Grace, Vera, x86 and different GPUs and networking. The GB200 and GB300 NVL72 sit on top: a liquid-cooled rack that wires 72 Blackwell GPUs into one 130 TB/s NVLink domain the scheduler treats as a single GPU. Choose HGX or MGX when you want OEM choice and an existing data hall, DGX when you want NVIDIA to own the whole stack, and NVL72 only when a model genuinely needs a rack-sized memory pool and you can deliver roughly 120 kW of liquid cooling per rack.

Who this is for: AI-infrastructure architects and platform engineers sizing a GPU build who need to map NVIDIA's system brands to a purchase order, a data-hall power budget and a deployment model. Prerequisites: the GPU lineup from Part 3 and the memory math from Part 4. This part is the system layer that sits above the chips.

Three procurement teams asked me the same thing last year: is DGX better than HGX? It is the wrong question. DGX, HGX and MGX are not a good-better-best ladder. They are three answers to who integrates the box. The same Blackwell GPU shows up in all three. What changes is who owns the baseboard, the chassis, the firmware and the support call when a node drops at 2 a.m. Get that model straight and the NVL72 rack, the part everyone fixates on, becomes much easier to place in a design.

Same silicon, three integrators

Start with what does not change. A Blackwell B200 or B300 GPU is the same part number regardless of the box it ends up in. NVIDIA sells that compute three ways, and the three brands describe how much of the surrounding system NVIDIA hands you versus how much an OEM or you assemble. Confusing them leads to bad bids, where a team specs a DGX expecting OEM pricing or specs HGX expecting NVIDIA to own firmware updates.

HGX: the engine board

HGX is the GPU baseboard. It carries eight SXM-form-factor GPUs wired together with NVLink and the on-board NVSwitch chips, plus the power and cooling interfaces. It is not a server. Dell, Supermicro, HPE, Lenovo and the rest buy the HGX board from NVIDIA and build a complete 2-rack-unit or 4-rack-unit node around it: their own CPUs, memory, NICs, BMC, PSUs and chassis. When you buy an HGX H200 or HGX B200 server, you are buying an OEM machine with an NVIDIA engine inside. Firmware, RMA and support run through that OEM, not NVIDIA.

DGX: NVIDIA's finished system

DGX is NVIDIA building the whole server itself on the same HGX baseboard. NVIDIA picks the CPU, the memory config, the NICs and the chassis, validates it as one SKU, ships it with the AI Enterprise software stack and Base Command, and owns the support contract end to end. You pay for that integration and for a single throat to choke. A DGX node is opinionated on purpose: you do not get to swap the CPU or the NIC vendor. For teams without a hardware engineering bench, that is a feature, not a limit.

MGX: the modular kit

MGX is the newest of the three and the one most people get wrong. It is a modular reference specification for the chassis, the rack and the trays, designed so a system builder can drop in different CPUs (Grace, Vera, x86), different GPUs, and different networking across generations without redesigning the sheet metal each time. MGX is how the rack-scale NVL72 systems are physically built, and it is also how vendors ship mixed CPU-GPU inference nodes and converged HPC boxes such as the GB200 NVL4. Think of MGX as the LEGO baseplate, HGX as one specific brick, and DGX as the fully built kit NVIDIA sells in a sealed box.

Who owns each layerSame Blackwell GPU, three integration modelsHGXDGXMGXGPU: NVIDIAGPU: NVIDIAGPU: NVIDIABaseboard: NVIDIABaseboard: NVIDIATray: builder choiceChassis + CPU: OEMChassis + CPU: NVIDIAChassis + CPU: builderSoftware: youSoftware: NVIDIA stackSoftware: you / builderSupport: OEMSupport: NVIDIASupport: builder
Red layers are owned by NVIDIA, grey layers by an OEM or by you. The choice is about integration and support, not GPU quality.
DimensionHGXDGXMGX
What it is8-GPU baseboardComplete NVIDIA systemModular chassis and rack spec
IntegratorOEMNVIDIAOEM or ODM
FlexibilityOEM picks CPU/NICFixed by NVIDIAMix CPU, GPU, fabric, gen
Typical use8-GPU training/inference nodeTurnkey factory, SuperPODNVL72 racks, inference, HPC
Support pathOEMNVIDIA single contractBuilder, with NVIDIA RA
In practice: the layer that bites teams is not compute, it is firmware and support boundaries. On HGX, a GPU NVLink fault means a ticket to the OEM, who may bounce you to NVIDIA, who may ask for an OEM firmware bundle. On DGX the chain is one vendor. If your ops team is small, that single contract is worth real money the first time a node misbehaves under load.

From one GPU to a building block

The reason the brands matter is that they nest. A GPU goes onto an HGX baseboard. A baseboard goes into a node, whether NVIDIA-built (DGX) or OEM-built (HGX server). Nodes get grouped into a scalable unit, and scalable units get stitched into a cluster such as a DGX SuperPOD. NVL72 is a different shape of the same idea: instead of eight GPUs in a node, it puts 72 GPUs in a rack and treats the rack as the unit. Understanding that nesting tells you where the network boundaries fall, which is the single most important thing for performance.

The building-block stackEach unit nests into the next, and each boundary is a different networkGPUB200/B300HGX board8 GPU + NVSwitchNodeDGX or HGX serverScalable unitnodes + leaf fabricSuperPODfull clusterNVLink insideInfiniBand / Ethernet between
NVLink binds GPUs inside a node or rack. Past that boundary you are on InfiniBand or Spectrum-X Ethernet, which is an order of magnitude slower per link.

NVL72: the rack that acts as one GPU

The NVL72 is where the AI factory story gets interesting, and where most of the marketing noise lives. The pitch is simple to state and hard to build: take 72 Blackwell GPUs, wire every one of them to every other through a rack-spanning NVLink switch fabric, and present the whole thing to the scheduler as a single memory domain. No InfiniBand hop in the middle. A model that needs more memory than one GPU holds can spread across all 72 at NVLink speed instead of paying the network tax at every layer boundary.

This matters most for two workloads: trillion-parameter inference, where the model weights plus the KV cache do not fit in eight GPUs, and mixture-of-experts models, where token routing scatters work across many experts that all need fast access to each other. On an 8-GPU node those cross-GPU hops cross the slower network repeatedly. Inside an NVL72 they stay on NVLink. That is the entire reason the rack exists.

GB200 NVL72

The GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs through the fifth-generation NVLink Switch System, which carries 130 TB/s of GPU-to-GPU bandwidth across the rack. Each GPU gets 1.8 TB/s of NVLink. The rack holds 13.4 TB of HBM3e as one pooled address space at 576 TB/s aggregate, plus 17 TB of LPDDR5X on the Grace side. NVIDIA rates it at 1,440 PFLOPS of sparse NVFP4 and 720 PFLOPS of FP8. The unit pairs one Grace CPU to two Blackwell GPUs over NVLink-C2C on each GB200 superchip, and the rack stacks 36 of those superchips.

GB300 NVL72

The GB300 NVL72 keeps the same 72-GPU, 36-Grace, single-NVLink-domain shape but swaps in Blackwell Ultra GPUs aimed at test-time scaling and reasoning workloads, where the model thinks longer per query and the KV cache balloons. The headline change architects care about is memory per GPU: roughly 288 GB on GB300 against an effective ~186 GB on GB200, which means larger partitions per GPU and fewer cross-GPU page moves for very large models. NVIDIA pairs it with Quantum-X800 InfiniBand or Spectrum-X Ethernet, ConnectX-8 SuperNICs and Mission Control, and quotes up to a 50x AI-factory output gain against a Hopper baseline. Treat that 50x as a best-case factory-level figure, not a per-GPU speedup. [VERIFY exact GB300 per-GPU HBM capacity against the GB300 datasheet at deployment time, since SKUs vary.]

Spec (per rack)GB200 NVL72GB300 NVL72
GPUs72 Blackwell72 Blackwell Ultra
Grace CPUs3636
Pooled GPU memory13.4 TB HBM3e~20 TB HBM3e [VERIFY]
NVLink bandwidth130 TB/s130 TB/s
Sparse NVFP41,440 PFLOPShigher, reasoning-tuned [VERIFY]
Target workloadTrillion-param train + inferenceTest-time scaling, reasoning
CoolingLiquid, ~120 kW/rackLiquid, ~120 kW+/rack [VERIFY]
NVL72 rack layout72 GPUs in compute trays, joined by a central NVLink switch spineCompute trays (18)2 Grace + 4 Blackwell GPUs2 Grace + 4 Blackwell GPUs2 Grace + 4 Blackwell GPUs… 18 trays, 72 GPUs total …2 Grace + 4 Blackwell GPUsPower shelf + manifoldsLiquid in 25C, out ~45C~120 kW per rackNVLink switch trays (9)130 TB/s all-to-all72 GPUs, one domain1.8 TB/s per GPUSpine ties every tray toevery other at full bandwidth
Tray counts are approximate and vary by OEM build; the constant is 72 GPUs joined by a central NVLink switch spine into one domain. Verify exact tray topology against your vendor reference architecture.
Gotcha: the NVL72 is not a drop-in for an air-cooled hall. It is liquid-cooled by design, draws on the order of 120 kW per rack, and needs coolant distribution units, manifolds and a facility water loop. Plenty of teams buy the rack and then discover their data hall tops out at 30 to 40 kW per rack and has no liquid loop at all. The cooling and power retrofit, not the GPU lead time, is usually the real schedule risk.

Scale-up versus scale-out, and why the boundary matters

Every GPU cluster has two networks, and the NVL72 changes where the line between them falls. Scale-up is the fast NVLink domain that binds GPUs into one memory pool. Scale-out is the InfiniBand or Spectrum-X Ethernet fabric that connects those domains into a larger cluster. On a traditional 8-GPU node the scale-up domain is eight GPUs, so any model bigger than eight GPUs of memory crosses the slow fabric. The NVL72 stretches the scale-up domain to 72 GPUs, so the slow boundary only appears when you go rack to rack.

That single fact is the design lever. If your largest model fits inside one NVL72, you keep all the heavy collective traffic on 130 TB/s NVLink and the InfiniBand fabric only carries data-parallel gradient syncs between racks. If your model spans several racks, you are back to paying the network tax at the rack boundary, and your fabric design (rails, topology, congestion control) starts to dominate throughput. I cover that fabric in later parts on NVLink and InfiniBand.

Scale-up inside, scale-out betweenThe NVL72 pushes the slow boundary out to the rack edgeNVL72 rack A72 GPUs, one domain130 TB/s NVLinkscale-up: fast, all-to-allNVL72 rack B72 GPUs, one domain130 TB/s NVLinkscale-up: fast, all-to-allInfiniBand /Spectrum-Xscale-out: slower, per-link
Keep the largest model inside one rack and the slow fabric only carries cross-rack syncs. Span racks and the fabric design starts to govern throughput.

Worked example

Say you want to serve a 1.8T-parameter MoE model at FP4. Weights alone at roughly 0.5 byte per parameter need about 900 GB, and a healthy KV cache for long-context, high-concurrency serving can add several hundred GB more. That does not fit in an 8-GPU HGX node holding around 1.5 TB of HBM once you reserve headroom and replicate experts.

An NVL72 holds 13.4 TB of pooled HBM3e at 576 TB/s. The model, the KV cache and the expert replicas all live in one NVLink domain, so token routing across experts never crosses InfiniBand. That is the case where the rack earns its 120 kW. If your model fits comfortably in 8 GPUs, it does not, and a pair of HGX nodes is the cheaper, easier build. Size the memory first, then decide the chassis.

DGX SuperPOD: stitching racks into a cluster

Above the rack sits the DGX SuperPOD, which is NVIDIA's validated cluster reference architecture. It is built from scalable units, where each unit is a fixed count of nodes or NVL72 racks plus the leaf-spine fabric, management network, storage and cabling that tie them together. The value of the SuperPOD is not novelty, it is that NVIDIA has already done the painful integration: the rail-optimized InfiniBand topology, the Base Command and Mission Control management layer, the storage reference, and the validation that it runs at the rated numbers. You buy a known-good blueprint instead of discovering fabric bugs at 1,000 GPUs.

For most enterprises this is the line between buying a factory and building one. If you have a hyperscaler-grade network team, you can assemble HGX or MGX nodes and design your own fabric and save money. If you do not, the SuperPOD or an OEM equivalent reference architecture is cheaper once you count the engineering time and the cost of getting the fabric wrong. When the target is running this on VMware Cloud Foundation rather than bare metal, the deployment shifts again, and I cover that path in the Private AI reference architecture.

My take: the brand on the bezel matters far less than two numbers: how big your largest single model is in GB, and how many kilowatts and liters per second your facility can deliver to one rack. Answer those honestly and the DGX-versus-HGX-versus-MGX question mostly answers itself.

What I would actually choose

Here is my position, not a neutral survey. For a first enterprise AI build with a small platform team, I recommend a DGX BasePOD or SuperPOD, or an OEM HGX reference build with a strong support contract. Why: you get a validated stack and one support path, which is worth more than the integration savings when your team has never run GPU fabric at scale. When it is not the right call: if you have an experienced network and hardware team and a cost target, self-integrating HGX or MGX nodes will be cheaper and you give up little. What to validate first: your per-rack power and cooling envelope, because it silently caps every other decision.

For NVL72 specifically, I only recommend it when the workload genuinely needs a rack-sized memory domain: trillion-parameter or large MoE inference, or training runs where a 72-GPU NVLink domain measurably cuts collective time. When it is not the right call: models that fit in 8 to 16 GPUs do not benefit enough to justify the liquid-cooling retrofit and the power bill. What to validate first: that your facility can actually deliver roughly 120 kW and a water loop to each rack, ideally with a small pilot rack before you commit a row. Buying NVL72 racks for models that fit in a node is the most expensive mistake I see in this space.

Disclaimer: rack power, coolant temperatures and tray topology vary by OEM build and by NVIDIA reference-architecture revision. Treat the figures here as planning baselines and confirm every power, cooling and capacity number against your vendor's current datasheet and your facility engineer before committing a purchase or a data-hall change.

The Verdict

DGX, HGX and MGX are integration choices, not performance tiers, and the NVL72 is a memory-domain decision dressed up as a hardware one. Choose your integrator by the size of your team and your appetite to own firmware and fabric. Choose the rack by the size of your model and the limits of your building. The teams that get this right size the memory and the power envelope before they ever argue about brands. The teams that get it wrong buy a 120 kW rack to serve a model that would have run on two air-cooled nodes. If you do one thing after reading this, pull your largest target model's memory footprint and your data hall's per-rack power limit into the same document, because those two numbers decide everything above.

Next we drop a level into the GPUs themselves and how you carve them up: MIG, vGPU, time-slicing and passthrough, and when each one is the right call. If you are sizing a build right now, start that power-and-memory worksheet today and bring it to the partitioning discussion.

NVIDIA AI Series · Part 5 of 30
« Previous: Part 4  |  NVIDIA AI Guide  |  Next: Part 6 »

References

NVIDIA GB200 NVL72 product page and specifications
NVIDIA MGX modular reference architecture
NVIDIA DGX SuperPOD reference architecture (GB200) components

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

You May Have Missed

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading