Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF 9 Hardware: The Compatibility Traps That Stall Bring-Up (VCF 9 Series, Part 29)

A field guide to the five hardware traps that stall VCF 9 bring-up: non-certified NVMe, a stale vSAN HCL, sub-25GbE NICs, SD/USB boot media, and under-spec hosts.

VCF 9 Series · Part 29 of 36

TL;DR · Key Takeaways

  • Most stalled VCF 9 bring-ups are not software bugs. They are hardware that passes the vSphere HCL but fails the stricter vSAN ESA HCL.
  • Five traps cause almost all of it: non-certified or RAID-hidden NVMe, a stale embedded vSAN HCL database, sub-25GbE or single-uplink NICs, SD/USB boot media with no persistent ESX-OSData, and under-spec RAM and cores.
  • Buy vSAN ESA ReadyNodes for production. The mock VIB and HCL JSON edits are lab-only and explicitly unsupported by Broadcom.
  • The November 2025 ReadyNode changes cut certified RAM and CPU floors, so re-check the Broadcom Compatibility Guide before you assume a host is too small.
Who this is for: architects and admins speccing or commissioning hosts for a VCF 9 fleet.  Prerequisites: VCF 9.0.1 or later installer, and a target BOM checked against the Broadcom Compatibility Guide before any hardware is ordered.

The VCF Installer validation goes green on CPU and memory, then dies on storage with No vSAN ESA certified disks found. The hosts boot ESX 9.0 without complaint. They run VMs fine. They still refuse to form a VCF fleet. If you have deployed VCF 9 more than once, you have seen some version of this, and it is almost never the software. It is the hardware choices made months earlier, on a spreadsheet, by someone who checked the vSphere HCL and assumed that was enough.

VCF 9 standardizes on vSAN ESA, and the ESA hardware bar is higher and less forgiving than the general vSphere bar. A host can be perfectly happy as a standalone ESX server and still be rejected the moment you ask it to join a vSAN ESA cluster. Below are the five traps that actually stall bring-up in the field, what causes each one, and how to clear it. The order roughly matches how often I see them on real engagements.

Disclaimer: the mock VIB and HCL database edits mentioned below are for nested labs only and are not supported by Broadcom. For any production fleet, use hardware listed on the vSAN ESA section of the Broadcom Compatibility Guide, validate the full BOM, and run the VCF Installer prechecks before you commit.

Trap 1: The NVMe that ESX loves and vSAN ESA rejects

Symptom: validation fails with No vSAN ESA certified disks found, or later with No storage pool eligible disks found on ESXi host.
Likely cause: the NVMe device or its controller is not on the vSAN ESA HCL, or the drives are presented behind a RAID controller instead of as native NVMe. vSAN ESA is an all-NVMe architecture with a single tier of certified TLC NVMe devices. A SATA or SAS SSD that was fine for OSA, or a consumer NVMe drive, or an enterprise NVMe sitting behind a RAID HBA in pass-through mode, will all fail this check even though ESX itself claims the disks happily.

Fix: confirm the devices present as native NVMe and that both the drive model and the platform are on the vSAN ESA HCL, not just the vSphere HCL. Drop any RAID controller into a true pass-through or HBA mode, or remove it from the storage path entirely. ESA wants direct device access.

# Confirm the devices are seen as native NVMe, not behind a RAID adapter
esxcli nvme device list
esxcli storage core adapter list

# List devices ESX considers vSAN ESA eligible
esxcli vsan storagepool list

In a nested lab where you simply cannot get certified disks, William Lam’s vSAN ESA hardware mock VIB lets validation proceed, and as of VCF 9.0.1 the Installer exposes a checkbox to bypass the ESA disk check directly. Useful for learning. Do not carry it into production: an ESA cluster on uncertified NVMe is one firmware quirk away from data loss, and you will own that outcome. For the storage design reasoning behind ESA in VCF 9, see vSAN ESA vs OSA in VCF 9: storage design and when to choose which.

Trap 2: A stale HCL database fails hosts that are actually certified

Symptom: the hosts are genuinely on the ESA HCL, the disks are certified, and validation still reports the hosts are not on the vSAN HCL.
Likely cause: the VCF Installer ships an embedded vSAN HCL database (all.json) and refuses to trust it once it is more than 90 days old. If you are running from a VCF Offline Depot that did not include an updated HCL file, the Installer falls back to that embedded copy, which is almost always past 90 days by the time you deploy. The result is a false negative: good hardware, stale reference data.

Fix: give the Installer a current HCL database. The clean path is to let it pull the latest file through the VCF Online Depot. For air-gapped sites, download the current database and replace the embedded copy before you re-run validation.

# On the VCF Installer, back up then replace the embedded vSAN HCL database
cp /nfs/vmware/vcf/nfs-mount/vsan-hcl/all.json /nfs/vmware/vcf/nfs-mount/vsan-hcl/all.json.bak

# Drop the freshly downloaded all.json into the same path, then re-run validation.
# No services need restarting on the Installer.

This one bites teams who do everything right on the hardware side and then lose a day chasing a phantom compatibility problem. Make “refresh the HCL database” a standing line item in your air-gapped runbook, right next to depot sync. The planning checklist in VCF 9 planning and prerequisites is the right place to capture it.

Trap 3: The NIC check, and why 1GbE and a single uplink lose

Symptom: validation flags the host networking as insufficient, or bring-up succeeds and then vSAN and vMotion traffic contend badly under load.
Likely cause: the host was specced with 1GbE or a single high-speed uplink. VCF 9 expects two 25GbE NICs per host for the management domain, and the Installer enforces a minimum NIC speed check (relaxed to a 10GbE bypass option in 9.0.1 for labs). ESA pushes far more east-west storage traffic than OSA did, and a single uplink gives you no redundancy for the converged vSAN, vMotion, and overlay traffic riding the same NICs.

Fix: spec two 25GbE ports per host as the floor, on a NIC model that is on the I/O compatibility list, and split them across two physical cards or ports for redundancy. Do not try to run a production ESA fleet on 10GbE just because the 9.0.1 bypass will let validation pass. The bypass exists for nested labs, not as permission to under-provision the fabric your storage now lives on.

Trap 4: SD and USB boot media and the persistent OSData rule

Symptom: an alarm reading Secondary persistent device not found, or a bring-up that fails outright because there is no persistent ESX-OSData.
Likely cause: the hosts boot from SD cards or USB only, a layout that was common on older fleets. In ESX 9.0 the ESX-OSData volume is a single unified persistent partition that consolidates /scratch, logs, and VMware Tools. Using SD or USB for OSData is deprecated, and VCF bring-up expects that persistent region to exist on real local media.

Fix: give every host a dedicated persistent boot device of at least 128 GB. Broadcom’s guidance is a device rated for at least 128 TBW of endurance and 100 MB/s sequential write, ideally a RAID 1 mirror of two industrial M.2 or NVMe devices so a single boot device failure does not take the host down. Confirm the OSData partition actually landed on persistent storage after install rather than on a RAM disk.

# Confirm OSDATA is on a persistent device, not a RAM disk
esxcli storage filesystem list | grep -i osdata

# Boot ESX in UEFI mode. Legacy BIOS is deprecated in vSphere 9.0.

While you are in the firmware, confirm UEFI boot, the NX/XD bit, and Intel VT-x or AMD RVI are all enabled. Legacy BIOS support is being removed, and a host that boots BIOS today is a host you will be reprovisioning sooner than you want.

Trap 5: Under-spec RAM and cores

Symptom: validation passes but the management domain is starved from day one, or the host does not meet the ReadyNode profile you thought you bought.
Likely cause: repurposing older hosts against the wrong number. ESX 9.0 itself only needs two cores and 8 GB to install, which lulls people into thinking modest hosts are fine. A real VCF 9 management domain on ESA is a different story: plan for a meaningful core count per host (a full stack including VCF Automation wants on the order of 12 cores and 24 threads as a practical floor) and ESA ReadyNode memory, which historically started high.

Fix: size to the current ReadyNode profile, not to the ESX install minimum. The good news from late 2025 is that Broadcom lowered the certified ReadyNode floors substantially, with reductions of up to 67 percent in RAM and up to 33 percent in CPU cores for nodes certified for vSAN storage clusters, and up to 50 percent in RAM for vSAN HCI cluster nodes. So re-check the current numbers before you reject a host as too small or over-buy to be safe. For how host sizing rolls up into domain and fleet design, see the VCF 9 reference architecture.


The ESA host spec that passes validationSpec to the ReadyNode profile, not the ESX install minimumStorageCertified native NVMe TLC, no RAID pathNetwork2 x 25GbE, split across cards for redundancyBoot128 GB persistent device, RAID-1 mirror, UEFIComputeReadyNode RAM + cores (re-check the 2025 lower floors)HCLPlatform + drive on the vSAN ESA HCL; refresh all.json
Five things to get right before the Installer ever runs its hardware checks.

Quick reference: symptom to fix

Where VCF 9 bring-up fails on hardware Follow the failed check to the fix VCF Installer validation runs No ESA certified disks Host not on vSAN HCL NIC speed too low No persistent OSData Host under-spec Certified native NVMe, no RAID path Refresh the HCL database (all.json) Two 25GbE ports, redundant 128 GB persistent boot device Size to current ReadyNode profile
Each failed VCF Installer check maps to one hardware fix.
SymptomLikely causeFix
No vSAN ESA certified disks foundNon-certified NVMe, or disks behind a RAID controllerUse HCL-listed native NVMe; set the controller to pass-through
Host not on vSAN HCL (but it is)Embedded HCL database older than 90 daysRefresh all.json via online depot or manual replace
NIC speed insufficient1GbE or single uplink speccedTwo 25GbE ports per host, split for redundancy
Secondary persistent device not foundSD/USB-only boot, no persistent ESX-OSData128 GB persistent device, RAID 1 mirror, UEFI boot
Domain starved or below ReadyNode specSized to ESX install minimum, not ReadyNodeMatch current ReadyNode RAM and core profile

What I’d Do

For any production VCF 9 fleet, buy vSAN ESA ReadyNodes and stop optimizing for the wrong cost. The temptation to build your own from a parts list to shave a few percent off the BOM is exactly how teams end up with the failures above, plus a support conversation that goes nowhere because the configuration was never certified. A ReadyNode is not a tax. It is the thing that makes the vSAN HCL check a formality instead of a multi-day investigation. Where I do spend design energy is the boot media and the network: a mirrored 128 GB persistent boot pair and two 25GbE uplinks per host are cheap insurance against the two traps that are most annoying to fix after the fact.

The lab workarounds have their place. Use the mock VIB and the HCL refresh to learn VCF 9 on whatever hardware you have. Just never let lab habits cross into a production design review. What hardware compatibility trap has cost you the most time on a VCF 9 deployment?

ReadyNode vs build-your-ownA ReadyNode is not a tax; it is what makes the HCL check a formalityvSAN ESA ReadyNodeCertified configurationHCL check is a formalityFully supportableBuild-your-own from partsShaves a few % off the BOMMulti-day HCL investigationsSupport conversations go nowhere
Spend design energy on boot media and networking, not on shaving the BOM with uncertified parts.

References

VCF 9 Series · Part 29 of 36
« Previous: Part 28  |  VCF 9 Complete Guide  |  Next: Part 30 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading