- The GB200 NVL72 pulls ~120 kW per rack at full load. Air cooling tops out at 30-40 kW. Liquid is not optional here.
- Direct-to-chip cold plates on GPUs, CPUs, and NVLink switches feed a rack manifold and CDU; the facility water loop is the integration point that breaks most retrofits.
- At 1.2 MW available power in a typical pod, liquid-cooled NVL72 racks give you roughly 8-10 racks per megawatt. Air-cooled H100 HGX at 10.2 kW per node gets you maybe 25-30 racks, but at a fraction of the compute density.
- Validate facility water supply pressure, flow rate, inlet temperature, and pipe material before signing any purchase order.
Your facilities team approved 40 kW per rack. You ordered GB200 NVL72. These two facts are incompatible. The NVL72 draws approximately 120 kW at sustained full load, roughly three times what a well-provisioned air-cooled H100 rack would draw and six times what most enterprise data centers were designed to handle a decade ago. Liquid cooling is not a premium option on Blackwell-class systems. It is the only option. The question is whether your building can support the plumbing.
Why Air Cooling Hits a Wall at 40 kW
Data center air cooling works by moving chilled air across hot components, relying on the thermal mass of a high-volume airstream to carry heat to the return plenum. The problem is that air has terrible heat capacitance. Water carries roughly 3,400 times more heat per unit volume than air at the same temperature delta. At 20-25 kW per rack, CRAC units and hot-aisle containment can manage this reasonably well. At 30-40 kW, you are pushing the physics hard: air velocity through dense GPU trays creates excessive back-pressure, fan power itself becomes significant, and hotspot temperatures on GPU die edges climb even as average exit air temperature looks acceptable.
The NVIDIA HGX B200 air-cooled variant does exist, with per-GPU TDP capped at around 1,000 W for eight GPUs in a 4U node. That is roughly 8 kW of GPU power per node before you add CPUs, NVLink switches, storage, and networking. Fit four nodes in a rack and you are at 35-40 kW, which is workable with good containment and adequate CRAC tonnage. But the HGX B200 air-cooled variant does not give you NVLink-connected fabric across all 72 GPUs in a single coherent memory domain. That architecture only exists in the NVL72 form factor, and the NVL72 is liquid-cooled by design. There is no air-cooled NVL72. If you want the full Blackwell scale-up experience, liquid is non-negotiable.
How Direct-to-Chip Liquid Cooling Works in the NVL72
The NVL72 is an 18-slot rack containing 18 compute trays, each with two Grace CPUs and four Blackwell B200 GPUs. Every thermally significant component in each tray, the GPUs, the CPUs, and the NVLink switches, gets a direct-to-chip cold plate bonded to the die package. There are no heatsinks and no fans on the compute trays. Liquid is the only thermal path.
Inside the rack, a supply manifold runs coolant from bottom to top, branching to each tray through quick-disconnect fittings. After absorbing heat from the cold plates, the heated coolant returns up the return manifold to the CDU sitting at the top or side of the rack. That is a closed secondary loop running dielectric-treated water or a water-glycol mix. The CDU contains a liquid-to-liquid heat exchanger that transfers thermal load to the facility water loop, keeping the two circuits physically separate. This separation matters: it isolates potentially corrosive or biologically active facility water from the precision-cooled IT loop. The facility-side water then carries the heat to dry coolers, cooling towers, or direct free-air heat exchangers at the building perimeter.
Coolant Temperature and Flow Requirements
The NVL72 IT loop typically runs with supply coolant entering cold plates at 25-35 degrees Celsius and returning at 40-50 degrees Celsius. That delta-T of 10-15 degrees is what drives the required flow rate: to carry 120 kW across a 10-degree delta-T using water, you need roughly 2.9 liters per second (about 700 gallons per hour) per rack. The CDU must match or exceed this. Vendor CDUs sized for NVL72 deployments, such as those from CoolIT Systems and Vertiv, typically carry 150-200 kW capacity to give headroom over the nominal 120 kW draw [VERIFY exact model specs with CDU vendor].
On the facility side, what you can supply to the CDU matters enormously. If your facility water arrives at 25 degrees Celsius, you have sufficient approach temperature for the heat exchanger to work efficiently. If it arrives at 35 degrees Celsius because your cooling tower is undersized or your climate is warm, you lose margin fast. NVIDIA and partners target facility water inlet temperatures at or below 30 degrees Celsius for optimal operation [VERIFY with NVIDIA NVL72 site preparation guide for exact spec]. Above that threshold, some CDUs throttle capacity or require chiller assist, which raises your PUE.
Immersion Cooling: The Other Path
Single-phase immersion dunks servers in a dielectric fluid bath, removing heat with 1,000 times the efficiency of air. Two-phase immersion uses a low-boiling-point fluid that vaporizes at chip temperature and condenses on a coil above. Both approaches theoretically support 200+ kW per tank. The NVL72, however, ships with direct-to-chip cold plates and is not rated for fluid immersion in any current NVIDIA reference design. If immersion is your facility strategy, you would need to use third-party Blackwell-based systems or wait for future generations. For Blackwell today, direct-to-chip is the supported path.
Platform Power and Cooling Comparison
| Platform | Rack Power | Cooling Method | GPUs/Rack | NVLink Domain | Racks/MW |
|---|---|---|---|---|---|
| HGX H100 (air) | ~30 kW | Air | 8 | 8 GPUs | ~33 |
| HGX B200 (air) | ~40 kW | Air | 8 | 8 GPUs | ~25 |
| HGX H100/B200 (liquid) | ~60-70 kW | Direct-to-chip | 8 | 8 GPUs | ~15 |
| GB200 NVL72 | ~120 kW | Direct-to-chip (required) | 72 | 72 GPUs | ~8 |
| GB300 NVL72 | ~132 kW | Direct-to-chip (required) | 72 | 72 GPUs | ~7.5 |
Worked example
Sizing a 1 MW NVL72 pod
Suppose you have a 1 MW power allocation. You want to fit as many NVL72 racks as possible and understand your effective PUE.
- Rack count: Each NVL72 draws 120 kW at sustained load. At 1 MW, that gives you roughly 8 racks (8 x 120 kW = 960 kW, leaving 40 kW for networking and overhead). Those 8 racks contain 576 B200 GPUs in a single NVLink fabric.
- CDU power overhead: Each CDU draws approximately 5-10 kW of pump and control power. For 8 racks with 8 CDUs, add ~60-80 kW overhead.
- Facility cooling power: With direct-to-chip removing ~95% of heat to the liquid loop, the remaining ~5% still goes to room air. Good containment plus dry coolers keep mechanical cooling overhead low. Achievable partial PUE (cooling-only) of 1.03-1.05 is realistic with direct-to-chip at this density versus 1.3-1.5 for air-cooled CRAC-heavy designs.
- Effective compute PUE: If total facility draw including all overhead is 1.06 MW to serve 960 kW of IT load, PUE = 1.06 / 0.96 = 1.10. Compare that to an air-cooled H100 pod at 1 MW where cooling overhead alone might consume 250-300 kW, giving PUE of 1.25-1.30.
- Coolant flow: 8 racks x 2.9 L/s = 23.2 L/s (approximately 370 GPM) of facility water you need provisioned to the CDUs.
- Floor footprint: 8 NVL72 racks versus roughly 192 air-cooled HGX H100 racks (using 33 racks/MW) to get similar compute. The NVL72 approach uses roughly 25x less floor area for equivalent GPU count.
These are illustrative figures. Final design requires vendor CDU specs, actual facility water temperature, and a full power chain audit from the utility transformer to the rack PDU.
PUE, Energy Efficiency, and What the Numbers Actually Mean
PUE (Power Usage Effectiveness) is total facility power divided by IT equipment power. A PUE of 1.0 is theoretical perfection. Legacy air-cooled data centers running Hopper-generation hardware often land at 1.4-1.6 once you account for chillers, CRAC fans, lighting, and UPS losses. Modern hyperscale designs with hot-aisle containment and economizer modes can achieve 1.1-1.2 on air-cooled workloads in favorable climates.
Liquid cooling changes the equation significantly. When you move 95% or more of the heat to a liquid loop that connects directly to dry coolers or indirect evaporative cooling towers, you dramatically reduce the mechanical refrigeration load. In mild climates, you can run entirely on free cooling for most of the year: no compressor cycles, just pump power and airflow through heat rejection coils. This is how modern liquid-cooled AI factories achieve PUE below 1.1. NVIDIA reports that the GB200 NVL72 platform delivers 25x the energy efficiency of equivalent H100 air-cooled infrastructure. The water efficiency claim, 300x better than traditional air-cooled with evaporative cooling towers, comes from eliminating chiller and cooling tower water consumption entirely in dry-cooler designs.
Retrofit Risk: What Actually Breaks
Most enterprise data centers built before 2020 were designed for 5-15 kW per rack. Even newer facilities targeting AI were often spec’d for 30-40 kW with enhanced air cooling. Retrofitting one of these spaces for 120 kW liquid-cooled racks touches almost every system in the building.
The power chain is usually the first constraint. A full NVL72 rack needs dedicated three-phase power feed, typically A and B redundant circuits each rated for 80+ kW (accounting for 1+1 redundancy). Most existing rack PDUs are not sized for this; you will need new busway, new PDUs, and likely transformer upgrades. If your facility utility feed is already at capacity, you are looking at a utility upgrade process that can take 18-36 months in many jurisdictions.
The water side is more nuanced but often more disruptive. You need facility water piping routed to each rack row, with supply and return headers sized for the total flow. Existing chilled water plants may not be compatible: if your facility runs a 7 degree Celsius supply (typical for precision air cooling), you need to check whether this is too cold for the CDU heat exchanger design point, as condensation risk on cold pipes in a warm room is real. Pipe material matters too: the CDU secondary loop typically runs an inhibited propylene glycol and deionized water mixture, while facility water may be treated differently. You need isolation at the CDU so these chemistries do not cross-contaminate.
What to Validate with Facilities Before Signing the PO
Before committing to NVL72 racks, get written answers from your facilities team on these specific items:
| Validation Item | What to Confirm | Risk if Wrong |
|---|---|---|
| Available power per rack | 2 x 80 kW circuits (A+B), breaker sizing, busway capacity | Thermal trip, rack downtime |
| Facility water flow at rack | Min 3 L/s per NVL72 rack at rated pressure | CDU under-cooling, GPU throttle |
| Supply water temperature | Max 30 degC at rack inlet (cooler is better) | Chiller assist required, PUE degrades |
| Pipe material and chemistry | Stainless or HDPE supply; no galvanized; inhibitor compatibility | Cold plate corrosion, CDU seal failure |
| Floor load rating | NVL72 rack is ~1,600 kg when filled | Structural failure, voided insurance |
| Leak detection and drainage | Floor drains in row, leak sensors at CDU and manifold | Water damage to adjacent racks |
| Overhead clearance | NVL72 rack height is approximately 2.4m with CDU; check cable tray conflicts | Installation blocked, retrofit cost |
Air vs Liquid: Where Each Still Wins
Air cooling is not dead. For inference-optimized deployments using B200 HGX or H100 HGX with 8-GPU scale-up and scale-out over InfiniBand, air-cooled at 30-40 kW per rack is perfectly viable. You avoid CDU capital cost, you avoid facility plumbing, and your existing data center likely already supports the power density. If you are running smaller LLMs for inference at moderate concurrency and 8 GPUs per node is enough, the air-cooled HGX path makes operational sense. The HGX B200 in air-cooled form still delivers generational improvement over Hopper at a manageable facility cost.
Liquid becomes non-negotiable the moment you need the 72-GPU NVLink coherent domain, which is the defining architecture for trillion-parameter training and real-time inference at the scale DeepSeek R2, GPT-5 class, and frontier MoE models demand. It is also non-negotiable when your compute-per-floor-tile target is high enough that fitting 30+ racks of air-cooled hardware to match one row of NVL72 racks is simply not feasible in your building footprint.
My Take: When Liquid Is Non-Negotiable and When to Hold Off
If you are running or planning to run trillion-parameter models in training or real-time inference, you are buying NVL72 racks, and those racks require liquid cooling. Full stop. The 72-GPU NVLink domain is not replicated in any air-cooled configuration, and the performance gap for MoE and dense transformer models at scale is not something you bridge with extra InfiniBand bandwidth. Model parallelism across InfiniBand adds latency and communication overhead that matters at trillion-parameter scale. Liquid is not about being modern. It is about the physics of what the 120 kW NVL72 produces thermally and what NVLink-C2C-based GPU-to-GPU bandwidth requires in terms of physical co-location.
When NOT to go liquid: if your data center is a standard enterprise facility built for 10-20 kW racks, and your workload is inference on models up to 70B parameters using 8-GPU HGX B200 nodes, stay on air. The infrastructure investment to retrofit liquid cooling, new piping, CDUs, leak detection, water treatment program, may cost more than the energy savings justify over a 3-year horizon. Air HGX B200 will serve you well for that workload profile. Plan your next facility expansion with liquid-ready infrastructure so you are not making this decision under deadline pressure next time.
What to validate first, before any purchase order for NVL72: get your facilities team to physically walk the power and water path from the utility entrance to the proposed rack locations. Not a design review. A physical walk. Then commission a hydraulic model of your water distribution network. These two steps surface 90% of the retrofit surprises before they become construction change orders.
If you are deploying in a co-location facility, read the power and cooling addendum in the SLA before you sign. Many colocation operators have now published NVL72-specific supplemental agreements that specify CDU responsibility, water quality requirements, and leak liability. Some charge a premium for liquid-cooled cages that can materially change your TCO versus owning facility space. Get those terms in writing early.
Part 11 covers what runs on top of this physical infrastructure: the host software stack, drivers, CUDA, and the Container Toolkit. If you are working through where NVL72 fits in the broader reference architecture, the NVIDIA AI Series pillar page maps the full 30-part sequence. For the VCF-specific deployment layer on top of this hardware, see the Private AI Series.
References
- NVIDIA GB200 NVL72 Product Page, nvidia.com
- Chill Factor: NVIDIA Blackwell Platform Boosts Water Efficiency by Over 300x, NVIDIA Blog, April 2025
- NVIDIA GB200 NVL72 Delivers Trillion-Parameter LLM Training and Real-Time Inference, NVIDIA Technical Blog
- NVIDIA DGX GB Rack Scale Systems User Guide, docs.nvidia.com
- Why AI Rack Densities Make Liquid Cooling Non-Negotiable, Network World



