TL;DR
DGX, HGX and MGX are not three product tiers. They are three answers to the question of who integrates the box around the same NVIDIA silicon. HGX is the 8-GPU baseboard that OEMs build a server around. DGX is NVIDIA's own finished system on that baseboard with the software stack and support attached. MGX is the modular rack and node specification that lets a vendor mix Grace, Vera, x86 and different GPUs and networking. The GB200 and GB300 NVL72 sit on top: a liquid-cooled rack that wires 72 Blackwell GPUs into one 130 TB/s NVLink domain the scheduler treats as a single GPU. Choose HGX or MGX when you want OEM choice and an existing data hall, DGX when you want NVIDIA to own the whole stack, and NVL72 only when a model genuinely needs a rack-sized memory pool and you can deliver roughly 120 kW of liquid cooling per rack.
Three procurement teams asked me the same thing last year: is DGX better than HGX? It is the wrong question. DGX, HGX and MGX are not a good-better-best ladder. They are three answers to who integrates the box. The same Blackwell GPU shows up in all three. What changes is who owns the baseboard, the chassis, the firmware and the support call when a node drops at 2 a.m. Get that model straight and the NVL72 rack, the part everyone fixates on, becomes much easier to place in a design.
Same silicon, three integrators
Start with what does not change. A Blackwell B200 or B300 GPU is the same part number regardless of the box it ends up in. NVIDIA sells that compute three ways, and the three brands describe how much of the surrounding system NVIDIA hands you versus how much an OEM or you assemble. Confusing them leads to bad bids, where a team specs a DGX expecting OEM pricing or specs HGX expecting NVIDIA to own firmware updates.
HGX: the engine board
HGX is the GPU baseboard. It carries eight SXM-form-factor GPUs wired together with NVLink and the on-board NVSwitch chips, plus the power and cooling interfaces. It is not a server. Dell, Supermicro, HPE, Lenovo and the rest buy the HGX board from NVIDIA and build a complete 2-rack-unit or 4-rack-unit node around it: their own CPUs, memory, NICs, BMC, PSUs and chassis. When you buy an HGX H200 or HGX B200 server, you are buying an OEM machine with an NVIDIA engine inside. Firmware, RMA and support run through that OEM, not NVIDIA.
DGX: NVIDIA's finished system
DGX is NVIDIA building the whole server itself on the same HGX baseboard. NVIDIA picks the CPU, the memory config, the NICs and the chassis, validates it as one SKU, ships it with the AI Enterprise software stack and Base Command, and owns the support contract end to end. You pay for that integration and for a single throat to choke. A DGX node is opinionated on purpose: you do not get to swap the CPU or the NIC vendor. For teams without a hardware engineering bench, that is a feature, not a limit.
MGX: the modular kit
MGX is the newest of the three and the one most people get wrong. It is a modular reference specification for the chassis, the rack and the trays, designed so a system builder can drop in different CPUs (Grace, Vera, x86), different GPUs, and different networking across generations without redesigning the sheet metal each time. MGX is how the rack-scale NVL72 systems are physically built, and it is also how vendors ship mixed CPU-GPU inference nodes and converged HPC boxes such as the GB200 NVL4. Think of MGX as the LEGO baseplate, HGX as one specific brick, and DGX as the fully built kit NVIDIA sells in a sealed box.
| Dimension | HGX | DGX | MGX |
|---|---|---|---|
| What it is | 8-GPU baseboard | Complete NVIDIA system | Modular chassis and rack spec |
| Integrator | OEM | NVIDIA | OEM or ODM |
| Flexibility | OEM picks CPU/NIC | Fixed by NVIDIA | Mix CPU, GPU, fabric, gen |
| Typical use | 8-GPU training/inference node | Turnkey factory, SuperPOD | NVL72 racks, inference, HPC |
| Support path | OEM | NVIDIA single contract | Builder, with NVIDIA RA |
From one GPU to a building block
The reason the brands matter is that they nest. A GPU goes onto an HGX baseboard. A baseboard goes into a node, whether NVIDIA-built (DGX) or OEM-built (HGX server). Nodes get grouped into a scalable unit, and scalable units get stitched into a cluster such as a DGX SuperPOD. NVL72 is a different shape of the same idea: instead of eight GPUs in a node, it puts 72 GPUs in a rack and treats the rack as the unit. Understanding that nesting tells you where the network boundaries fall, which is the single most important thing for performance.
NVL72: the rack that acts as one GPU
The NVL72 is where the AI factory story gets interesting, and where most of the marketing noise lives. The pitch is simple to state and hard to build: take 72 Blackwell GPUs, wire every one of them to every other through a rack-spanning NVLink switch fabric, and present the whole thing to the scheduler as a single memory domain. No InfiniBand hop in the middle. A model that needs more memory than one GPU holds can spread across all 72 at NVLink speed instead of paying the network tax at every layer boundary.
This matters most for two workloads: trillion-parameter inference, where the model weights plus the KV cache do not fit in eight GPUs, and mixture-of-experts models, where token routing scatters work across many experts that all need fast access to each other. On an 8-GPU node those cross-GPU hops cross the slower network repeatedly. Inside an NVL72 they stay on NVLink. That is the entire reason the rack exists.
GB200 NVL72
The GB200 NVL72 connects 36 Grace CPUs and 72 Blackwell GPUs through the fifth-generation NVLink Switch System, which carries 130 TB/s of GPU-to-GPU bandwidth across the rack. Each GPU gets 1.8 TB/s of NVLink. The rack holds 13.4 TB of HBM3e as one pooled address space at 576 TB/s aggregate, plus 17 TB of LPDDR5X on the Grace side. NVIDIA rates it at 1,440 PFLOPS of sparse NVFP4 and 720 PFLOPS of FP8. The unit pairs one Grace CPU to two Blackwell GPUs over NVLink-C2C on each GB200 superchip, and the rack stacks 36 of those superchips.
GB300 NVL72
The GB300 NVL72 keeps the same 72-GPU, 36-Grace, single-NVLink-domain shape but swaps in Blackwell Ultra GPUs aimed at test-time scaling and reasoning workloads, where the model thinks longer per query and the KV cache balloons. The headline change architects care about is memory per GPU: roughly 288 GB on GB300 against an effective ~186 GB on GB200, which means larger partitions per GPU and fewer cross-GPU page moves for very large models. NVIDIA pairs it with Quantum-X800 InfiniBand or Spectrum-X Ethernet, ConnectX-8 SuperNICs and Mission Control, and quotes up to a 50x AI-factory output gain against a Hopper baseline. Treat that 50x as a best-case factory-level figure, not a per-GPU speedup. [VERIFY exact GB300 per-GPU HBM capacity against the GB300 datasheet at deployment time, since SKUs vary.]
| Spec (per rack) | GB200 NVL72 | GB300 NVL72 |
|---|---|---|
| GPUs | 72 Blackwell | 72 Blackwell Ultra |
| Grace CPUs | 36 | 36 |
| Pooled GPU memory | 13.4 TB HBM3e | ~20 TB HBM3e [VERIFY] |
| NVLink bandwidth | 130 TB/s | 130 TB/s |
| Sparse NVFP4 | 1,440 PFLOPS | higher, reasoning-tuned [VERIFY] |
| Target workload | Trillion-param train + inference | Test-time scaling, reasoning |
| Cooling | Liquid, ~120 kW/rack | Liquid, ~120 kW+/rack [VERIFY] |
Scale-up versus scale-out, and why the boundary matters
Every GPU cluster has two networks, and the NVL72 changes where the line between them falls. Scale-up is the fast NVLink domain that binds GPUs into one memory pool. Scale-out is the InfiniBand or Spectrum-X Ethernet fabric that connects those domains into a larger cluster. On a traditional 8-GPU node the scale-up domain is eight GPUs, so any model bigger than eight GPUs of memory crosses the slow fabric. The NVL72 stretches the scale-up domain to 72 GPUs, so the slow boundary only appears when you go rack to rack.
That single fact is the design lever. If your largest model fits inside one NVL72, you keep all the heavy collective traffic on 130 TB/s NVLink and the InfiniBand fabric only carries data-parallel gradient syncs between racks. If your model spans several racks, you are back to paying the network tax at the rack boundary, and your fabric design (rails, topology, congestion control) starts to dominate throughput. I cover that fabric in later parts on NVLink and InfiniBand.
Worked example
Say you want to serve a 1.8T-parameter MoE model at FP4. Weights alone at roughly 0.5 byte per parameter need about 900 GB, and a healthy KV cache for long-context, high-concurrency serving can add several hundred GB more. That does not fit in an 8-GPU HGX node holding around 1.5 TB of HBM once you reserve headroom and replicate experts.
An NVL72 holds 13.4 TB of pooled HBM3e at 576 TB/s. The model, the KV cache and the expert replicas all live in one NVLink domain, so token routing across experts never crosses InfiniBand. That is the case where the rack earns its 120 kW. If your model fits comfortably in 8 GPUs, it does not, and a pair of HGX nodes is the cheaper, easier build. Size the memory first, then decide the chassis.
DGX SuperPOD: stitching racks into a cluster
Above the rack sits the DGX SuperPOD, which is NVIDIA's validated cluster reference architecture. It is built from scalable units, where each unit is a fixed count of nodes or NVL72 racks plus the leaf-spine fabric, management network, storage and cabling that tie them together. The value of the SuperPOD is not novelty, it is that NVIDIA has already done the painful integration: the rail-optimized InfiniBand topology, the Base Command and Mission Control management layer, the storage reference, and the validation that it runs at the rated numbers. You buy a known-good blueprint instead of discovering fabric bugs at 1,000 GPUs.
For most enterprises this is the line between buying a factory and building one. If you have a hyperscaler-grade network team, you can assemble HGX or MGX nodes and design your own fabric and save money. If you do not, the SuperPOD or an OEM equivalent reference architecture is cheaper once you count the engineering time and the cost of getting the fabric wrong. When the target is running this on VMware Cloud Foundation rather than bare metal, the deployment shifts again, and I cover that path in the Private AI reference architecture.
What I would actually choose
Here is my position, not a neutral survey. For a first enterprise AI build with a small platform team, I recommend a DGX BasePOD or SuperPOD, or an OEM HGX reference build with a strong support contract. Why: you get a validated stack and one support path, which is worth more than the integration savings when your team has never run GPU fabric at scale. When it is not the right call: if you have an experienced network and hardware team and a cost target, self-integrating HGX or MGX nodes will be cheaper and you give up little. What to validate first: your per-rack power and cooling envelope, because it silently caps every other decision.
For NVL72 specifically, I only recommend it when the workload genuinely needs a rack-sized memory domain: trillion-parameter or large MoE inference, or training runs where a 72-GPU NVLink domain measurably cuts collective time. When it is not the right call: models that fit in 8 to 16 GPUs do not benefit enough to justify the liquid-cooling retrofit and the power bill. What to validate first: that your facility can actually deliver roughly 120 kW and a water loop to each rack, ideally with a small pilot rack before you commit a row. Buying NVL72 racks for models that fit in a node is the most expensive mistake I see in this space.
The Verdict
DGX, HGX and MGX are integration choices, not performance tiers, and the NVL72 is a memory-domain decision dressed up as a hardware one. Choose your integrator by the size of your team and your appetite to own firmware and fabric. Choose the rack by the size of your model and the limits of your building. The teams that get this right size the memory and the power envelope before they ever argue about brands. The teams that get it wrong buy a 120 kW rack to serve a model that would have run on two air-cooled nodes. If you do one thing after reading this, pull your largest target model's memory footprint and your data hall's per-rack power limit into the same document, because those two numbers decide everything above.
Next we drop a level into the GPUs themselves and how you carve them up: MIG, vGPU, time-slicing and passthrough, and when each one is the right call. If you are sizing a build right now, start that power-and-memory worksheet today and bring it to the partitioning discussion.
References
NVIDIA GB200 NVL72 product page and specifications
NVIDIA MGX modular reference architecture
NVIDIA DGX SuperPOD reference architecture (GB200) components



