TL;DR · Key Takeaways
- In VCF 9 you do not run the NSX Manager OVA install. VCF deploys the three-node cluster for you when you create the first VI workload domain.
- The workflow stands up 3 NSX Managers in the management domain, configures a cluster VIP, and adds an anti-affinity rule so no two Managers share a host.
- Later workload domains can share the existing NSX cluster or get a new one. That choice is an architecture decision, not a checkbox.
- Deployment success is decided by prerequisites: forward and reverse DNS for every Manager and the VIP, NTP in sync, IPs, and at least three commissioned hosts. Most failed deployments are a missing PTR record.
- After it builds, verify with
get cluster status: every group type STABLE across all three nodes.
The first time an NSX-T veteran deploys NSX in VCF 9, they go looking for the OVA and the install wizard and cannot find them. That is the point. In VCF 9 you do not install NSX Manager by hand. VCF builds the cluster for you as part of creating the first VI workload domain, which means the skill that used to matter, clicking through the appliance deployment, is gone, and the skill that always actually mattered, getting the prerequisites exactly right, is now the whole game. This part walks the bring-up the way it really happens: what VCF does, what you owe it first, and how to confirm the cluster is healthy before you build anything on top.
What VCF deploys for you
When you create your first VI workload domain, the VCF workflow does the NSX deployment as a series of automated steps. It places three NSX Manager appliances in the management domain, wires them into a cluster, assigns the cluster VIP you specified, and creates a vSphere anti-affinity rule so the three Managers never land on the same host. You provide inputs; VCF does the build and the post-config. Your job shifts from operator to designer: get the names, addresses, and sizing right, and let the automation run.
Prerequisites: the boring list that decides everything
I have watched more NSX deployments fail on DNS than on anything technical about NSX. The automation is unforgiving about names and time, because the cluster nodes have to find and trust each other. Walk this table before you start the workflow, not after it fails halfway through.
| Prerequisite | Requirement | How to verify |
|---|---|---|
| Forward DNS (A) | A record for each NSX Manager and the cluster VIP. | nslookup the FQDN returns the right IP. |
| Reverse DNS (PTR) | PTR record for every Manager IP and the VIP. | nslookup the IP returns the right FQDN. |
| NTP | All components synced to a reliable source. | Same time on SDDC Manager, vCenter, hosts. |
| IP addresses | Three Manager IPs plus the VIP, on the mgmt network. | Free, reserved, and not in any DHCP scope. |
| Hosts | At least three commissioned hosts with storage. | Visible and ready in SDDC Manager inventory. |
| VIP FQDN | Resolvable before you start; entered as the Appliance Cluster FQDN. | Forward and reverse both resolve. |
DNS is worth a hard check because the failure mode is ugly: the workflow gets a long way in, then stalls or rolls back, and the error rarely says “PTR record missing” in plain language. Two commands settle it.
# Forward: name should resolve to the VIP address
nslookup nsx-vip.mgmt.lab.local
# Reverse: the VIP address should resolve back to the same name
nslookup 10.0.0.30
# Do the same for each of the three Manager FQDNs and IPs.
# Both directions must match. A missing PTR is the #1 cause of a stalled build.
Shared or dedicated NSX per workload domain
The first VI workload domain always deploys a fresh NSX Manager cluster. The real decision comes with the second and every domain after: point it at the existing NSX cluster, or stand up a new one. This is a genuine architecture fork, not a default to click past.
| Dimension | Shared NSX cluster | Dedicated per domain |
|---|---|---|
| Footprint | Lower: one 3-node cluster total. | Higher: three Managers per domain. |
| Blast radius | Wider: one cluster issue touches all domains. | Contained: a domain’s NSX problem stays local. |
| Isolation | Soft: use Projects/VPCs for tenancy. | Hard: separate control planes entirely. |
| Lifecycle | Simpler: one cluster to upgrade. | More work: each cluster on its own. |
| Use it when | Most estates; efficiency matters. | Strong separation, compliance, or different SLAs. |
My default is shared. The footprint saving is real, the lifecycle is simpler, and NSX 9 gives you Projects and VPCs (Part 22) to handle tenant isolation inside one cluster, which is exactly what they are for. I go dedicated only when there is a hard reason: a regulated domain that must not share a control plane, wildly different upgrade cadences, or a separation requirement written into a contract. Do not split control planes out of vague unease; split them for a named requirement.
The VIP, the three nodes, and anti-affinity
The cluster VIP is the single name you and your automation talk to. One of the three Managers owns the VIP at a time; if it fails, the VIP moves to a surviving node. That is why the VIP FQDN has to resolve before deployment and why it is the address you put in monitoring and runbooks, never an individual node IP. The anti-affinity rule VCF creates keeps the three appliances on separate hosts, so a single host failure can never take more than one Manager with it. Check that rule survived; I have seen DRS settings and host maintenance flatten three Managers onto two hosts and quietly erode the HA you designed.
Verify the bring-up before you build on it
VCF will report the domain as created, but I never hand a cluster to the next team on the strength of a green workflow alone. Two minutes of CLI confirms the control plane is genuinely healthy. SSH to any Manager node as admin and check it the same way you would during an incident (see Part 2).
# On any NSX Manager node
get cluster status # every group type should read STABLE
get cluster config # confirm 3 nodes, correct IPs and roles
# Group types to see STABLE across all three nodes:
# MANAGER POLICY CONTROLLER DATASTORE HTTPS CLUSTER_BOOT_MANAGER
# Then confirm the VIP answers and resolves:
ping nsx-vip.mgmt.lab.local
Then close out the operational basics: confirm the anti-affinity rule is present and enabled, confirm the cluster certificate and the VIP certificate are what you expect, and configure NSX backup to your SFTP target straight away. A control plane with no backup is a single bad change away from a very long day, and backup is covered properly in Part 19.
Why deployments stall, and the fix
When a VCF NSX deployment fails, it rarely fails for an interesting reason. The same handful of misses account for almost every stalled or rolled-back workflow I get called into, and all of them trace back to something that was true before the build ever started. The fix is almost never inside NSX; it is in DNS, time, addressing, or capacity. This is the short list I run down the moment a deployment task goes red.
| Symptom | Likely cause | Fix |
|---|---|---|
| Workflow stalls partway, then rolls back | Missing reverse (PTR) record for a Manager or the VIP | Add the PTR, confirm forward and reverse match, retry. |
| Cluster forms but nodes will not trust each other | NTP skew between appliances | Point all components at the same NTP source, resync. |
| Deployment cannot place an appliance | Fewer than three healthy hosts, or no free capacity | Commission a third host or free resources, then retry. |
| VIP unreachable after a green workflow | VIP FQDN or IP wrong, or not on the mgmt subnet | Correct the record, confirm the VIP is on the right network. |
Notice the pattern: not one of these is an NSX bug. They are environment facts the automation depends on, which is exactly why the prerequisites table earlier is the most important part of this whole post. If a deployment does fail, resist the urge to start over from scratch. Read the task error in SDDC Manager, fix the one underlying fact, and retry the failed task. The workflow is built to resume, and a from-scratch redo usually just wastes an hour reaching the same blocker.
What I’d Do
Spend your effort before the workflow, not during it. Build the DNS records forward and reverse, prove them from SDDC Manager, lock NTP, reserve the IPs and the VIP, and confirm three healthy hosts. Decide shared versus dedicated on a named requirement and default to shared. Then let VCF do the build, and verify with get cluster status before anyone celebrates. A clean NSX deployment is almost entirely a clean prerequisites list; the automation rarely fails for reasons of its own. Next up is Part 6: host transport node prep with VDS and EDP, where the design from Part 4 finally lands on real hosts. How solid is your reverse DNS right now, honestly?
The three-node cluster is a production requirement, not a recommendation
NSX Manager runs as a three-node cluster sitting behind a virtual IP, and that topology exists for management-plane resilience. A single node is perfectly fine in a lab and a genuine liability in production, because losing it takes your management and your Policy API with it. The three nodes share a distributed datastore, so the placement decision matters: spread them across hosts and failure domains with anti-affinity so that a single host or rack failure can never take two managers at once. A three-node cluster crammed onto two hosts is a three-node cluster pretending to be resilient.
The bring-up sequence rewards patience. Deploy the nodes, form the cluster, and confirm that get cluster status reports STABLE on all three before you build a single segment or rule on top. Configuring networking against a cluster that is still forming is how you end up with half-realized objects that are maddening to clean up. In VCF this whole dance is orchestrated for you by SDDC Manager, which is one more reason to let the platform own the lifecycle rather than hand-driving it. Either way, the rule is the same: a stable, healthy cluster first, configuration second, never the two interleaved.
Watch the VIP and the certificates
The cluster presents a single virtual IP for management, and that VIP together with the manager certificates are the two things teams forget about until they break. The VIP is how clients and automation reach the cluster regardless of which node is currently active, so it has to be reachable and correctly mapped, and a certificate that does not match the VIP name produces exactly the kind of intermittent trust error that swallows an afternoon of debugging. In VCF, SDDC Manager owns the certificate lifecycle for NSX, which means you rotate and renew through the platform rather than hand-editing certificates in NSX Manager. Reaching past the platform to swap a certificate directly is one of the most common ways to create drift that quietly disables the platform ability to manage NSX later, and it always surfaces at the worst possible moment, usually mid-upgrade.
The practical habit is to treat the management cluster as a managed appliance, not a server you log into and tinker with. Let the platform handle the cluster lifecycle, the VIP and the certificates, and reserve your direct NSX Manager access for the networking and security intent that genuinely belongs there. When the cluster does misbehave, start at get cluster status and at certificate and VIP health before you assume a deeper fault, because the boring causes account for the overwhelming majority of management-plane incidents. A calm, patient bring-up and a disciplined hands-off posture afterward are worth more than any clever recovery procedure you might need if you skip them.
References
- VMware Cloud Foundation 9.0 Release Notes (Broadcom TechDocs)
- NSX Manager VM and Host Transport Node System Requirements (NSX 9.0)
- Deploy a VI Workload Domain in VCF 9 with VCF Operations (drpranayjha.com)



