TL;DR · Key Takeaways
- The management domain needs 4 hosts for production. The Installer may technically accept 3 with vSAN, but 4 is the design requirement so a host can enter maintenance without breaking quorum.
- You need separate VLANs for ESX management, VM management, vMotion, vSAN, host TEP, edge TEP, and uplinks.
- The hard MTU floor on the TEP path is 1600 bytes. 1700 is recommended, 9000 is optimal. The Installer validates this and will block on failure.
- Forward and reverse DNS plus NTP must resolve from the Installer appliance before you start.
- Size the host TEP IP pool for roughly 2 IPs per host, because each active uplink gets a TEP.
Most failed VCF bring-ups are not failures of the platform. They are failures of preparation that the Installer catches at validation, hours into the job, when fixing them is most disruptive. Get the readiness checklist right and the deployment itself is almost dull. Here is what to nail down before you ever deploy the Installer appliance.
Hardware and host count
Every host has to be on the Broadcom Compatibility Guide. For vSAN ESA specifically, the hosts must match a vSAN ESA ReadyNode profile, not merely contain HCL-listed parts. The management domain is a 4-host cluster for production. The VCF Installer will technically accept a 3-host vSAN minimum (and 2 hosts with external FC, VMFS, or NFS), and lab tricks go lower, but 4 is the design requirement and the reason is operational: with 4 hosts a single host can enter maintenance mode without breaking vSAN quorum. Treat 3-host management as a lab-only shortcut. The full topology and appliance sizing sit in the reference architecture deep-dive.
Network: the VLANs you actually need
VCF 9 wants distinct VLANs, trunked and tagged to every host uplink, for ESX management, VM management, vMotion, vSAN, NSX host overlay (host TEP), NSX edge overlay (edge TEP), and NSX uplinks. Note that management is two VLANs, not one: a VLAN for ESX host management and a separate VLAN for VM management, where the VCF appliances (Operations, Automation, and the rest) run. The VDS must be version 8.0 or later, and NSX is prepared directly on the VDS with no legacy N-VDS.
Size the host TEP IP pool with the per-uplink behaviour in mind. A TEP is assigned per active uplink, so a 2-NIC VDS means 2 TEPs per host. Size that subnet for roughly double the host count, not 1:1, or you will run hosts out of TEP addresses mid-preparation.
| Network (VLAN) | Purpose | Planning note |
|---|---|---|
| ESX management | ESX host management | Static VMkernel IPs; forward and reverse DNS |
| VM management | VCF appliances (Operations, Automation, vCenter, NSX) | Separate from ESX mgmt; reserve VIPs |
| vMotion | Live migration traffic | Dedicated VMkernel; jumbo helps |
| vSAN | vSAN data traffic | Dedicated VMkernel; jumbo recommended |
| Host TEP (overlay) | NSX host overlay (Geneve) | ~2 IPs per host (per active uplink); MTU 1600+ |
| Edge TEP (overlay) | NSX edge overlay | Routable to host TEP; MTU 1600+ |
| NSX uplinks | North-south to the physical fabric | BGP or static peering to the ToR |
MTU: the number that blocks the most deployments
The overlay path has a hard MTU floor of 1600 bytes end to end. Broadcom recommends 1700 to absorb Geneve header expansion and future-proof, and 9000 (jumbo) for optimal throughput where the underlay supports it. The catch is that the VCF Installer enforces a 1600 MTU validation on the TEP path and blocks deployment if it fails. It pings TEP to TEP, and any single router or switch in the path that clamps below 1600 fails the check.
| Target | MTU (bytes) | Notes |
|---|---|---|
| Hard floor (TEP path) | 1600 | Installer validates TEP to TEP and blocks below this |
| Recommended | 1700 | Absorbs Geneve header growth, future-proofs |
| Optimal (jumbo) | 9000 | Best throughput where the underlay supports it end to end |
# Validate the 1600 floor between TEP VMkernels (1572 payload + headers)
vmkping -I vmk10 -d -s 1572 <remote-host-tep-ip>
# Validate jumbo (9000) end to end if the underlay supports it
vmkping -I vmk10 -d -s 8972 <remote-host-tep-ip>
DNS, NTP, and credentials
Every component, including the VCF Installer appliance itself, needs both forward (A) and reverse (PTR) DNS records resolvable before deployment. NTP time sync is a hard prerequisite, because certificate and vSAN operations depend on it. For the UI-driven wizard, the ESX hosts need a common root password. If your hosts have different passwords, you must use a JSON specification file instead of the wizard. Plan your IP and subnet allocations per VLAN up front so you are not inventing addresses during input.
Storage prerequisites, not just host count
Beyond the four-host floor, the datastore has to be genuinely shared: accessible from and writable by every host in the cluster, with enough free space for the full deployment per the planning workbook. If multiple datastores are present, VCF picks the principal by a fixed priority, vSAN first, then NFS v3, VMFS, NFS 4.1, iSCSI, and vVols last, and from 9.0.1 the datastore with the most free space wins instead. vVols is deprecated in version 9, so do not design a new domain around it. For vSAN ESA the hosts must match an ESA ReadyNode profile, and an OSA cluster expects deduplication and compression either both on or both off. Getting the storage type and datastore layout decided up front matters because the storage choice is welded on at host commission time, as covered in the vSAN ESA versus OSA storage design breakdown.
The depot, certificates, and passwords
Three readiness items get forgotten because they are not network or hardware. First, the Installer ships without binaries, so decide in advance whether you are pulling from the online Broadcom depot or building a private offline depot, and start that download early because it is the longest single wait. Second, regenerate the ESX self-signed certificates against each host FQDN before you begin, and delete stale disk partitions so vSAN can claim the devices cleanly. Third, the UI wizard needs a common ESX root password across the hosts. If they differ, you are on the JSON specification path, which is also the better route for repeatable multi-site builds. None of these block you technically, but each one stalls a bring-up at an annoying moment if you discover it live.
Plan the IP space once
Build the IP and subnet plan as a single table before you open the wizard, because you will enter values for every VLAN and every appliance, and inventing addresses mid-input is how typos and overlaps creep in. Allocate static ranges for ESX management, VM management, vMotion, and vSAN, size the host TEP subnet for roughly two addresses per host, and reserve VIPs for the NSX Manager cluster and the fleet appliances. Statically assigned VMkernel IPs are a hard requirement, not a preference, so a plan that assumes DHCP anywhere on the VMkernel will fail validation. Spend an hour on the address plan now and you save the back-and-forth that otherwise stretches a deployment across two evenings.
My take
The single most common day-zero blocker is the 1600 MTU TEP validation. Teams set jumbo frames on the VDS and host uplinks but forget the physical switch fabric and any intermediate L3 hop must carry at least 1600 end to end. The Installer pings across it and hard-fails if one router silently fragments. Validate underlay MTU with vmkping between hosts on different racks and leaf pairs, not just same-rack neighbours, because the failure is almost always at the L3 boundary. And do not reach for the documented MTU validation skip flag to push past it. You are not fixing the path, you are deferring the failure into production NSX as intermittent packet loss, which is far harder to diagnose than a red check in the wizard. The network mistakes that follow from skipping this are catalogued in Part 5.
What’s Next
Build the readiness checklist as a real document with an owner per line, then dry-run DNS and MTU before the Installer ever boots. With prerequisites green, you are ready for the management domain bring-up. Which prerequisite does your environment most often get wrong, DNS reverse records or end-to-end MTU?
References
- Broadcom TechDocs: MTU Guidance for NSX Transport Nodes
- VCF Blog: Planning a Successful VCF 9.0 Deployment
- Broadcom TechDocs: Management Domain Deployment Model



