Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

GPU Power, Cooling and Density: Why Blackwell Forces Liquid (NVIDIA AI Series, Part 10)

The GB200 NVL72 draws ~120 kW per rack and ships liquid-cooled by design. Learn why Blackwell-class systems make direct-to-chip cooling mandatory, how CDUs and facility water loops work, and what to validate before ordering.

NVIDIA AI Series · Part 10 of 30
TL;DR
  • The GB200 NVL72 pulls ~120 kW per rack at full load. Air cooling tops out at 30-40 kW. Liquid is not optional here.
  • Direct-to-chip cold plates on GPUs, CPUs, and NVLink switches feed a rack manifold and CDU; the facility water loop is the integration point that breaks most retrofits.
  • At 1.2 MW available power in a typical pod, liquid-cooled NVL72 racks give you roughly 8-10 racks per megawatt. Air-cooled H100 HGX at 10.2 kW per node gets you maybe 25-30 racks, but at a fraction of the compute density.
  • Validate facility water supply pressure, flow rate, inlet temperature, and pipe material before signing any purchase order.
Who this is for: Data center architects, facilities engineers, and IT infrastructure leads evaluating Blackwell GPU clusters. You should already understand rack PDU sizing and basic HVAC principles. If you are still choosing between GPU generations, start with Part 3: the GPU lineup and Part 5: DGX/HGX/MGX/NVL72 reference systems.

Your facilities team approved 40 kW per rack. You ordered GB200 NVL72. These two facts are incompatible. The NVL72 draws approximately 120 kW at sustained full load, roughly three times what a well-provisioned air-cooled H100 rack would draw and six times what most enterprise data centers were designed to handle a decade ago. Liquid cooling is not a premium option on Blackwell-class systems. It is the only option. The question is whether your building can support the plumbing.

Why Air Cooling Hits a Wall at 40 kW

Data center air cooling works by moving chilled air across hot components, relying on the thermal mass of a high-volume airstream to carry heat to the return plenum. The problem is that air has terrible heat capacitance. Water carries roughly 3,400 times more heat per unit volume than air at the same temperature delta. At 20-25 kW per rack, CRAC units and hot-aisle containment can manage this reasonably well. At 30-40 kW, you are pushing the physics hard: air velocity through dense GPU trays creates excessive back-pressure, fan power itself becomes significant, and hotspot temperatures on GPU die edges climb even as average exit air temperature looks acceptable.

The NVIDIA HGX B200 air-cooled variant does exist, with per-GPU TDP capped at around 1,000 W for eight GPUs in a 4U node. That is roughly 8 kW of GPU power per node before you add CPUs, NVLink switches, storage, and networking. Fit four nodes in a rack and you are at 35-40 kW, which is workable with good containment and adequate CRAC tonnage. But the HGX B200 air-cooled variant does not give you NVLink-connected fabric across all 72 GPUs in a single coherent memory domain. That architecture only exists in the NVL72 form factor, and the NVL72 is liquid-cooled by design. There is no air-cooled NVL72. If you want the full Blackwell scale-up experience, liquid is non-negotiable.

Rack Power vs Cooling Method
Practical density ceiling by cooling approach
0 40 80 120 160 ~30 kW Air H100 ~40 kW Air B200 ~70 kW Liquid HGX ~120 kW NVL72 ~132 kW GB300 kW per rack
Air-cooled practical ceiling is ~40 kW. NVL72 at 120 kW requires direct-to-chip liquid cooling. GB300 NVL72 approaches 132 kW.

How Direct-to-Chip Liquid Cooling Works in the NVL72

The NVL72 is an 18-slot rack containing 18 compute trays, each with two Grace CPUs and four Blackwell B200 GPUs. Every thermally significant component in each tray, the GPUs, the CPUs, and the NVLink switches, gets a direct-to-chip cold plate bonded to the die package. There are no heatsinks and no fans on the compute trays. Liquid is the only thermal path.

Inside the rack, a supply manifold runs coolant from bottom to top, branching to each tray through quick-disconnect fittings. After absorbing heat from the cold plates, the heated coolant returns up the return manifold to the CDU sitting at the top or side of the rack. That is a closed secondary loop running dielectric-treated water or a water-glycol mix. The CDU contains a liquid-to-liquid heat exchanger that transfers thermal load to the facility water loop, keeping the two circuits physically separate. This separation matters: it isolates potentially corrosive or biologically active facility water from the precision-cooled IT loop. The facility-side water then carries the heat to dry coolers, cooling towers, or direct free-air heat exchangers at the building perimeter.

NVL72 Liquid Cooling Loop Diagram
Heat path from GPU cold plate to facility rejection
GPU Cold Plates (x72) Rack Manifold Supply + Return cool supply hot return CDU Liquid-liquid heat exchanger 150-200 kW cap hot IT loop cooled IT loop Facility Loop Dry cooler / tower heat IT cooling loop (closed) Facility water loop (separate)
The CDU separates the precision IT coolant loop from facility water. This isolation is critical for corrosion control and maintenance.

Coolant Temperature and Flow Requirements

The NVL72 IT loop typically runs with supply coolant entering cold plates at 25-35 degrees Celsius and returning at 40-50 degrees Celsius. That delta-T of 10-15 degrees is what drives the required flow rate: to carry 120 kW across a 10-degree delta-T using water, you need roughly 2.9 liters per second (about 700 gallons per hour) per rack. The CDU must match or exceed this. Vendor CDUs sized for NVL72 deployments, such as those from CoolIT Systems and Vertiv, typically carry 150-200 kW capacity to give headroom over the nominal 120 kW draw [VERIFY exact model specs with CDU vendor].

On the facility side, what you can supply to the CDU matters enormously. If your facility water arrives at 25 degrees Celsius, you have sufficient approach temperature for the heat exchanger to work efficiently. If it arrives at 35 degrees Celsius because your cooling tower is undersized or your climate is warm, you lose margin fast. NVIDIA and partners target facility water inlet temperatures at or below 30 degrees Celsius for optimal operation [VERIFY with NVIDIA NVL72 site preparation guide for exact spec]. Above that threshold, some CDUs throttle capacity or require chiller assist, which raises your PUE.

Immersion Cooling: The Other Path

Single-phase immersion dunks servers in a dielectric fluid bath, removing heat with 1,000 times the efficiency of air. Two-phase immersion uses a low-boiling-point fluid that vaporizes at chip temperature and condenses on a coil above. Both approaches theoretically support 200+ kW per tank. The NVL72, however, ships with direct-to-chip cold plates and is not rated for fluid immersion in any current NVIDIA reference design. If immersion is your facility strategy, you would need to use third-party Blackwell-based systems or wait for future generations. For Blackwell today, direct-to-chip is the supported path.

Platform Power and Cooling Comparison

Platform Rack Power Cooling Method GPUs/Rack NVLink Domain Racks/MW
HGX H100 (air) ~30 kW Air 8 8 GPUs ~33
HGX B200 (air) ~40 kW Air 8 8 GPUs ~25
HGX H100/B200 (liquid) ~60-70 kW Direct-to-chip 8 8 GPUs ~15
GB200 NVL72 ~120 kW Direct-to-chip (required) 72 72 GPUs ~8
GB300 NVL72 ~132 kW Direct-to-chip (required) 72 72 GPUs ~7.5

Worked example

Sizing a 1 MW NVL72 pod

Suppose you have a 1 MW power allocation. You want to fit as many NVL72 racks as possible and understand your effective PUE.

  • Rack count: Each NVL72 draws 120 kW at sustained load. At 1 MW, that gives you roughly 8 racks (8 x 120 kW = 960 kW, leaving 40 kW for networking and overhead). Those 8 racks contain 576 B200 GPUs in a single NVLink fabric.
  • CDU power overhead: Each CDU draws approximately 5-10 kW of pump and control power. For 8 racks with 8 CDUs, add ~60-80 kW overhead.
  • Facility cooling power: With direct-to-chip removing ~95% of heat to the liquid loop, the remaining ~5% still goes to room air. Good containment plus dry coolers keep mechanical cooling overhead low. Achievable partial PUE (cooling-only) of 1.03-1.05 is realistic with direct-to-chip at this density versus 1.3-1.5 for air-cooled CRAC-heavy designs.
  • Effective compute PUE: If total facility draw including all overhead is 1.06 MW to serve 960 kW of IT load, PUE = 1.06 / 0.96 = 1.10. Compare that to an air-cooled H100 pod at 1 MW where cooling overhead alone might consume 250-300 kW, giving PUE of 1.25-1.30.
  • Coolant flow: 8 racks x 2.9 L/s = 23.2 L/s (approximately 370 GPM) of facility water you need provisioned to the CDUs.
  • Floor footprint: 8 NVL72 racks versus roughly 192 air-cooled HGX H100 racks (using 33 racks/MW) to get similar compute. The NVL72 approach uses roughly 25x less floor area for equivalent GPU count.

These are illustrative figures. Final design requires vendor CDU specs, actual facility water temperature, and a full power chain audit from the utility transformer to the rack PDU.

PUE, Energy Efficiency, and What the Numbers Actually Mean

PUE (Power Usage Effectiveness) is total facility power divided by IT equipment power. A PUE of 1.0 is theoretical perfection. Legacy air-cooled data centers running Hopper-generation hardware often land at 1.4-1.6 once you account for chillers, CRAC fans, lighting, and UPS losses. Modern hyperscale designs with hot-aisle containment and economizer modes can achieve 1.1-1.2 on air-cooled workloads in favorable climates.

Liquid cooling changes the equation significantly. When you move 95% or more of the heat to a liquid loop that connects directly to dry coolers or indirect evaporative cooling towers, you dramatically reduce the mechanical refrigeration load. In mild climates, you can run entirely on free cooling for most of the year: no compressor cycles, just pump power and airflow through heat rejection coils. This is how modern liquid-cooled AI factories achieve PUE below 1.1. NVIDIA reports that the GB200 NVL72 platform delivers 25x the energy efficiency of equivalent H100 air-cooled infrastructure. The water efficiency claim, 300x better than traditional air-cooled with evaporative cooling towers, comes from eliminating chiller and cooling tower water consumption entirely in dry-cooler designs.

Air vs Liquid: Decision Tree
Which cooling approach fits your situation
Rack power above 40 kW? (or NVL72 rack-scale system) Yes No Need 72-GPU NVLink domain? trillion-param models, MoE Yes NVL72 + Direct- to-Chip (required) No Liquid HGX Direct-to-chip optional Staying below 30 kW/rack? H100, A100, or inference-light Yes Air OK Plan for liquid later No Plan for liquid now Validate facility water before ordering CDU + manifold retrofit required
Start from rack power density. If you need a 72-GPU NVLink domain, NVL72 + direct-to-chip is the only path. Air works only below ~30 kW and only for Hopper or B200 HGX without rack-scale NVLink.

Retrofit Risk: What Actually Breaks

Most enterprise data centers built before 2020 were designed for 5-15 kW per rack. Even newer facilities targeting AI were often spec’d for 30-40 kW with enhanced air cooling. Retrofitting one of these spaces for 120 kW liquid-cooled racks touches almost every system in the building.

The power chain is usually the first constraint. A full NVL72 rack needs dedicated three-phase power feed, typically A and B redundant circuits each rated for 80+ kW (accounting for 1+1 redundancy). Most existing rack PDUs are not sized for this; you will need new busway, new PDUs, and likely transformer upgrades. If your facility utility feed is already at capacity, you are looking at a utility upgrade process that can take 18-36 months in many jurisdictions.

The water side is more nuanced but often more disruptive. You need facility water piping routed to each rack row, with supply and return headers sized for the total flow. Existing chilled water plants may not be compatible: if your facility runs a 7 degree Celsius supply (typical for precision air cooling), you need to check whether this is too cold for the CDU heat exchanger design point, as condensation risk on cold pipes in a warm room is real. Pipe material matters too: the CDU secondary loop typically runs an inhibited propylene glycol and deionized water mixture, while facility water may be treated differently. You need isolation at the CDU so these chemistries do not cross-contaminate.

Gotcha: The most common field failure I see when retrofitting liquid cooling is inadequate facility water pressure or flow at the rack level. Pipe runs from the mechanical room to a remote row can introduce so much friction loss that the CDU pump cannot achieve the rated flow, and thermal management degrades under sustained load. Model the hydraulic resistance of your pipe runs before you approve the design. A single NVL72 row 60 meters from the mechanical room with undersized headers will have a bad day under sustained GPU-compute load.

What to Validate with Facilities Before Signing the PO

Before committing to NVL72 racks, get written answers from your facilities team on these specific items:

Validation Item What to Confirm Risk if Wrong
Available power per rack 2 x 80 kW circuits (A+B), breaker sizing, busway capacity Thermal trip, rack downtime
Facility water flow at rack Min 3 L/s per NVL72 rack at rated pressure CDU under-cooling, GPU throttle
Supply water temperature Max 30 degC at rack inlet (cooler is better) Chiller assist required, PUE degrades
Pipe material and chemistry Stainless or HDPE supply; no galvanized; inhibitor compatibility Cold plate corrosion, CDU seal failure
Floor load rating NVL72 rack is ~1,600 kg when filled Structural failure, voided insurance
Leak detection and drainage Floor drains in row, leak sensors at CDU and manifold Water damage to adjacent racks
Overhead clearance NVL72 rack height is approximately 2.4m with CDU; check cable tray conflicts Installation blocked, retrofit cost

Air vs Liquid: Where Each Still Wins

Air cooling is not dead. For inference-optimized deployments using B200 HGX or H100 HGX with 8-GPU scale-up and scale-out over InfiniBand, air-cooled at 30-40 kW per rack is perfectly viable. You avoid CDU capital cost, you avoid facility plumbing, and your existing data center likely already supports the power density. If you are running smaller LLMs for inference at moderate concurrency and 8 GPUs per node is enough, the air-cooled HGX path makes operational sense. The HGX B200 in air-cooled form still delivers generational improvement over Hopper at a manageable facility cost.

Liquid becomes non-negotiable the moment you need the 72-GPU NVLink coherent domain, which is the defining architecture for trillion-parameter training and real-time inference at the scale DeepSeek R2, GPT-5 class, and frontier MoE models demand. It is also non-negotiable when your compute-per-floor-tile target is high enough that fitting 30+ racks of air-cooled hardware to match one row of NVL72 racks is simply not feasible in your building footprint.

Compute Density per MW: Air vs Liquid NVL72
GPU count and NVLink domain scale at 1 MW facility power
Air HGX H100 @ 1 MW Liquid NVL72 @ 1 MW ~33 racks 264 GPUs total (8 per rack) NVLink domain: 8 GPUs max Scale-out via IB required PUE: ~1.25-1.40 Facility water: none Chiller plant: required 8 racks 576 GPUs total (72 per rack) NVLink domain: 72 GPUs Unified memory 13.4 TB/rack PUE: ~1.05-1.10 Facility water: 23 L/s Chiller: optional in mild climate 2.2x GPU count 4x IB islands for model parallel Single coherent NVLink fabric per rack
At 1 MW, liquid NVL72 delivers 2.2x more GPUs and a dramatically tighter NVLink domain. The 8-GPU per rack air constraint forces InfiniBand scale-out for any model that exceeds 8-GPU fit.

My Take: When Liquid Is Non-Negotiable and When to Hold Off

If you are running or planning to run trillion-parameter models in training or real-time inference, you are buying NVL72 racks, and those racks require liquid cooling. Full stop. The 72-GPU NVLink domain is not replicated in any air-cooled configuration, and the performance gap for MoE and dense transformer models at scale is not something you bridge with extra InfiniBand bandwidth. Model parallelism across InfiniBand adds latency and communication overhead that matters at trillion-parameter scale. Liquid is not about being modern. It is about the physics of what the 120 kW NVL72 produces thermally and what NVLink-C2C-based GPU-to-GPU bandwidth requires in terms of physical co-location.

When NOT to go liquid: if your data center is a standard enterprise facility built for 10-20 kW racks, and your workload is inference on models up to 70B parameters using 8-GPU HGX B200 nodes, stay on air. The infrastructure investment to retrofit liquid cooling, new piping, CDUs, leak detection, water treatment program, may cost more than the energy savings justify over a 3-year horizon. Air HGX B200 will serve you well for that workload profile. Plan your next facility expansion with liquid-ready infrastructure so you are not making this decision under deadline pressure next time.

What to validate first, before any purchase order for NVL72: get your facilities team to physically walk the power and water path from the utility entrance to the proposed rack locations. Not a design review. A physical walk. Then commission a hydraulic model of your water distribution network. These two steps surface 90% of the retrofit surprises before they become construction change orders.

If you are deploying in a co-location facility, read the power and cooling addendum in the SLA before you sign. Many colocation operators have now published NVL72-specific supplemental agreements that specify CDU responsibility, water quality requirements, and leak liability. Some charge a premium for liquid-cooled cages that can materially change your TCO versus owning facility space. Get those terms in writing early.

Part 11 covers what runs on top of this physical infrastructure: the host software stack, drivers, CUDA, and the Container Toolkit. If you are working through where NVL72 fits in the broader reference architecture, the NVIDIA AI Series pillar page maps the full 30-part sequence. For the VCF-specific deployment layer on top of this hardware, see the Private AI Series.

In practice: Build your facility water distribution with at least 25% headroom on flow capacity from day one. GPU workloads are bursty and so is heat. A CDU that is running at 98% capacity under steady-state load has no margin when a training job hits 100% utilization across all 72 GPUs simultaneously. I have seen GPU clusters throttle on the first full-fabric training run because the facilities sizing was done against average utilization data from the prior air-cooled cluster. Blackwell at full load draws full load constantly during training. Design for peak, not average.
Disclaimer: Cooling infrastructure design requires licensed mechanical engineering review specific to your facility. The numbers in this article are reference figures based on publicly available NVIDIA specifications and third-party analyses. Always validate against NVIDIA NVL72 site preparation documentation and engage a qualified facilities engineer before committing to any liquid cooling retrofit or new build. Facility water quality, structural loads, and local electrical code requirements vary significantly.

References

NVIDIA AI Series · Part 10 of 30
« Previous: Part 9  |  NVIDIA AI Guide  |  Next: Part 11 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading