TL;DR · Key Takeaways
- In VCF 9.x, VCF Operations is the lifecycle control point. The upgrade starts there, not at SDDC Manager. Get that order wrong and you stall halfway through.
- 9.1 retires the standalone 9.0 Fleet Management Appliance and the Identity Broker, replacing them with a unified VCF Management Services cluster that needs its own fresh CIDR block (a /28 minimum, /27 if you have room) and DNS records.
- A dedicated License Appliance is now a hard requirement, with its own IP and DNS record. License assignment failures after upgrade are one of the most common 9.1 stumbles (see KB 424533).
- The strict sequence is: VCF Operations, depot, binaries, Management Services, dependency components (Avi/SRM), SDDC Manager, plan domain upgrade, prechecks, NSX, vCenter, ESX.
- Patching (async bundles) and version upgrades use the same engine but are not the same risk profile. Treat patches as routine hygiene, not optional.
Here is the mistake I see teams make on the very first 9.1 attempt: they open SDDC Manager, look for the upgrade bundle, and find nothing useful. That is not a bug. In the 9.x model the lifecycle brain moved to VCF Operations, and 9.1 pushes that further by deleting the standalone Fleet Management Appliance entirely. If your mental model still has SDDC Manager driving the upgrade, you will fight the platform the whole way. This runbook walks the full-stack sequence end to end, calls out the three places people actually get stuck, and separates real version upgrades from routine patching.
Why the order is the whole game
VCF upgrades are not a set of independent component updates you can run in any order. Lifecycle, operations, SDDC Manager, NSX, vCenter and ESX have hard dependencies on each other, and 9.1 adds a new one: the unified VCF Management Services cluster has to own inventory before anything else moves. The official guidance is blunt about it. Broadcom states that upgrading to 9.1 “requires a strict component upgrade sequence,” and an incorrect sequence “results in errors.” That is not boilerplate. The most common self-inflicted failure I see is starting the SDDC Manager or domain upgrade before VCF Operations and Management Services are healthy, then spending the rest of the window unwinding a half-migrated fleet.
The sequence below is the one that holds up in practice. The diagram shows the dependency flow so you can brief a change board on it in one glance.
Step 0: The prep that actually saves the weekend
Three new-in-9.1 items cause most of the avoidable pain, so handle them before the change window, not during it.
- Carve a fresh CIDR block for VCF Management Services. You cannot reuse your 9.0.x IPs. While the new cluster comes up, the old 9.0.x components are still online orchestrating the upgrade, so reused IPs collide mid-flight. The wizard wants an IP pool in CIDR format, not a range, so you need at least a /28 (14 usable, with 12 free required). Allocate a /27 if you can, for future scale-out.
- Stand up the License Appliance. 9.1 makes a dedicated license server a hard requirement, with its own IP and DNS record. Get it routed and resolvable before you start.
- Prepare DNS up front. Create forward and reverse records for the new service FQDNs (service-runtime, fleet-lifecycle, the license server, and so on) and validate resolution. The Management Services precheck fails on DNS or reachability errors, and that is not a fun thing to debug at 2 AM.
Then the usual hygiene: confirm a recent SDDC Manager backup on external SFTP, confirm backups for managed components, download every binary ahead of time, and stage a free temporary IP on the management subnet for the vCenter upgrade. If you ran the older 5.2.x to 9.x jump, much of this rhymes with the pitfalls in migrating older VCF to 9 and the things that bite you.
Step 1: Upgrade VCF Operations first
VCF Operations is the control point, so it goes first. The upgrade is a PAK file uploaded through the VCF Operations administrator interface, not a bundle from SDDC Manager.
# VCF Operations admin UI, primary node
Admin interface -> Software Update -> Install a Software Update
1. Upload the VCF Operations 9.1 upgrade PAK
2. Accept EULA and release info
3. Start install (the admin UI restarts; you will be logged out)
4. Log back in, watch Software Update until all nodes show ONLINE
# Fleet management migration prompt appears mid-upgrade:
- provide the root password of the OLD 9.0 fleet mgmt appliance
- the cloud proxy upgrades with the cluster
- the old fleet management appliance is decommissioned + powered off
Do not move on while the cluster flickers between offline and going-online. Wait for every node and the cluster itself to report online. This is the step that quietly replaces the 9.0 Fleet Management Appliance and the Identity Broker with the unified Management Services model, so let it finish cleanly.
Step 2: Depot, binaries, then Management Services
With Operations on 9.1, point it at the depot and pull everything you need before deploying the new service layer.
# Online depot
Build -> Depot Settings -> Configure (online depot)
enter the activation code from the business services console
verify the connection shows ACTIVE # if not, fix this before anything else
# Pull binaries
Build -> Lifecycle -> (select VCF instance) -> Binary Management
select the 9.1 bundles -> Download # do this BEFORE the window
# Deploy the new control plane
Build -> Lifecycle -> "Deploy Management Services" banner
CIDR block, e.g. 10.0.0.32/27 (gateway + VLAN)
map FQDNs (service-runtime, fleet-lifecycle, ...) to IPs in that block
run Management Services Precheck -> must be clean
Deploy and wait for SUCCESSFUL
SDDC Manager will refuse to proceed until Management Services is up and owns the inventory. That is by design. Once the task shows successful, lifecycle control has transferred to the new cluster and you can move to dependency components. For the bigger picture on how this fleet-level control plane is meant to operate day to day, see the VCF 9 fleet lifecycle management reference architecture.
Step 3: Dependency components before SDDC Manager
If Avi Load Balancer is deployed, upgrade it before SDDC Manager. Confirm the controller version against the 9.1 compatibility list, run Avi prechecks, verify controller cluster, service engine and virtual service health, then back up per the supported Avi procedure. Load balancer health problems are exactly the kind of thing that turns a clean platform upgrade into an incident bridge. If SRM or vSphere Replication is present, validate the supported path and confirm pairings, protection groups, recovery plans and replications are healthy first. These components sit slightly outside the normal SDDC Manager path, which is precisely why people forget them.
Step 4: SDDC Manager, domain plan, prechecks
Only now do you touch SDDC Manager, and you do it from VCF Operations.
# SDDC Manager
Build -> Lifecycle -> (VCF instance) -> SDDC Manager Updates
download the 9.1 update -> Update Now
# Plan the management domain upgrade
Build -> Lifecycle -> (management domain) -> Upgrades -> Plan Domain Upgrade
pick the VCF target version (+ custom component targets if needed)
choose the upgrade scenario for vCenter and NSX Manager
review -> submit
# Prechecks - non-negotiable
Run prechecks BEFORE NSX/vCenter/ESX. Validate:
cluster + host health, vSAN health, NSX manager/edge health,
vCenter services, certificate validity, DNS/NTP, backups, capacity
Do not proceed on red prechecks, and read the warnings too. Some “warnings” are future outages wearing polite clothing. At plan time you also choose optimized versus sequential; in production I lean sequential unless the window and environment genuinely justify parallel change.
Step 5: NSX, then vCenter, then ESX
Drive each from Build > Lifecycle > Upgrades. NSX first: click Upgrade Now, then validate management plane, control plane, transport nodes, edges and routing both before and after. vCenter next, and this one needs the temporary network details you staged earlier because the upgrade uses a temporary network configuration during the switchover; take the snapshot, choose the backup option, configure the temp IP/subnet/gateway, then schedule. ESX last: import the 9.1 image into Image Management, assign it to the target clusters, and roll out. In production prefer a batched cluster or host selection over upgrading everything at once, and watch DRS, maintenance-mode transitions, workload evacuation and any vSAN resync as hosts cycle.
After everything completes, validate from more than one pane. Check VCF Operations, SDDC Manager, the lifecycle version view, NSX Manager, vCenter and the host level. Only then remove temporary snapshots and take fresh backups of the upgraded components. Certificate, identity and backup problems love to surface right after an upgrade, so if something looks off there, the fixes in VCF 9 certificate, identity and backup failures and how to fix them are the right next stop.
Patching is not the same as upgrading
Two things confuse people here. First, the major-version move itself can have a gate: getting from some 9.0.x builds onto the 9.1 line may require applying a specific patch through the Admin Portal first, outside the normal Lifecycle UI. Check the release notes for your exact source build before you assume the standard flow applies. Second, async patches (security and bug-fix bundles released between full versions) run through the same Operations-driven lifecycle engine, but they are lower risk and should be routine. The teams that get burned are the ones treating every patch like a major upgrade and therefore deferring all of them, then carrying a stack of known CVEs for months.
My take
Patch on a predictable cadence, keep full version upgrades to planned windows, and never let “we will do it with the next big upgrade” become your patch strategy.
Three failures that bite, and what to do
License assignment failure after upgrade is common enough that Broadcom has a dedicated KB (424533); it usually traces back to the License Appliance not being deployed, resolvable, or reachable, so verify that appliance first. Binaries not appearing in VCF Operations almost always means the depot connection is not actually active or the entitlement is wrong, so re-check Depot Settings before blaming the bundle. And the “Import VCF Operations in Fleet Lifecycle” step can fail on a certificate SAN issue with VCF Operations for Networks, which is a certificate problem to fix rather than an upgrade to retry. The pattern across all three: the failure shows up at upgrade time but the root cause is prep work (DNS, depot, certificates) that was skipped earlier.
What I’d Do
Treat 9.1 as an architecture change, not a click-update. Spend a full prep cycle on the CIDR block, the License Appliance and DNS, rehearse the VCF Operations to Management Services hand-off in a non-production domain, and only then schedule the production window. Run the sequence top to bottom, refuse to advance on red prechecks, and validate from multiple panes before you call it done. Do that and the upgrade is uneventful, which is exactly what you want from infrastructure. What is the one component in your stack you always forget to check on the interoperability matrix? That is usually the one that bites.
References
- Broadcom TechDocs: Upgrade Sequence to 9.1
- Broadcom KB 440630: Upgrade Sequence and Related Issues for VCF and vSphere Foundation 9.1
- Broadcom KB 424533: License assignment failure on VCF 9.1



