Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

How to Migrate Workloads into VCF 9 with HCX and vMotion (VCF 9 Series, Part 16)

A field runbook for moving production VMs onto a VCF 9 workload domain with VCF Operations HCX 9.0: when to use vMotion, Bulk Migration or RAV, how to plan migration waves, and the gotchas that derail cutover.

VCF 9 Series · Part 16 of 36

TL;DR · Key Takeaways

  • In VCF 9, HCX is now VCF Operations HCX 9.0: a single unified manager appliance, licensed through your VCF key, with no separate HCX Enterprise SKU.
  • Four migration types carry the load: cold, HCX vMotion (one VM, zero downtime), Bulk Migration (parallel, reboot at switchover), and Replication Assisted vMotion or RAV (parallel and zero downtime, needs 150 Mbps per VM).
  • Plan in VCF Operations for Networks, group VMs into migration waves, then sync to HCX so it builds the Mobility Groups for you.
  • WAN Optimization, HCX Disaster Recovery and NSX V2T migration are removed in 9.0. Do not design a migration around them.
  • Validate the destination workload domain, network extension and licensing state before you move a single production VM.
Who this is for: architects and admins moving VMs from legacy vSphere or an older VCF instance onto a VCF 9 VI workload domain.  Prerequisites: a deployed VCF 9 management domain and target VI workload domain, VCF Operations running, routed connectivity between source and destination, and a valid VCF 9 license.

The bring-up is finished, the workload domain is green, and now someone has to move 800 production VMs onto it without starting a maintenance-window war. This is where most VCF 9 projects actually slow down. Standing up the platform is the easy part. Getting workloads across, with their IP addresses, their dependencies and their uptime commitments intact, is the part that gets escalated. This post is the runbook I use to do it cleanly.

Migrating Workloads into VCF 9: Pick the PathMatch the tool to the distance and the downtime budget.Same SSO / adjacent vCenters, low latency?YESvMotion / xVC-vMotionLive, near-zero downtime• Cross-vCenter vMotion (ELM or not)• Shared or replicated storage helps• Best for same-site consolidationNO / cross-siteVMware HCXStretch L2, bulk or scheduled• Bulk migration (reboot, scheduled)• RAV (live, large batches)• vMotion + Cold for edge cases• Network Extension for cutoverMy take: use vMotion for same-site consolidation, HCX when you cross a site boundary or need L2stretch and scheduled bulk waves. Do not run HCX where a plain cross-vCenter vMotion would do.
Choosing between cross-vCenter vMotion and VMware HCX when migrating workloads into VCF 9.

Pick the right mover: HCX, native vMotion, or Advanced Cross vCenter vMotion

Not every migration needs HCX. If your source and destination vCenters are both 8.x or 9.x, share routed L2 reachability for the VM networks, and you only have a handful of VMs to move, Advanced Cross vCenter vMotion (the native cross-vCenter workflow built into vSphere) does the job with zero appliances to deploy. It even works across different SSO domains now. For a quick lift of a dozen management VMs onto the new domain, that is the lowest-friction option.

HCX earns its keep the moment any of three things is true: you are crossing into a different network topology and need to keep the existing IP addresses, you are moving hundreds or thousands of VMs and need scheduling and parallelism, or the source is old enough (vSphere 7, or an early VCF instance) that you want a replication-based path rather than a live host-to-host vMotion.

My take

After enough of these: once you are past roughly two dozen VMs, or you have a re-IP problem, stop hand-rolling vMotions and deploy HCX. The Service Mesh pays for itself in the first wave.

What changed in VCF Operations HCX 9.0 (read this before you deploy)

If your last HCX deployment was on 4.x, the 9.0 model is different enough that old habits will trip you up. The biggest change: there is no separate HCX Cloud and HCX Connector appliance anymore. HCX 9.0 ships a single unified manager appliance deployed at both sites, and its role (source or destination) is decided automatically by the direction of the site pairing. Fewer appliances, fewer firewall rules, fewer version-mismatch surprises.

Licensing folds into VCF. HCX 9.0 is licensed exclusively with your VCF 9 license key, assigned from VCF Operations, and it activates automatically once it is connected to a licensed vCenter. You get a 90-day evaluation window before a license is required. The part people miss: if the VCF license lapses, HCX enters a grace period and then drops to read-only mode. A read-only HCX in the middle of a multi-week migration wave is a bad afternoon, so confirm the license state before you start, not after.

Consideration: Four features are gone in HCX 9.0: WAN Optimization (and its appliance), NSX V2T migration, HCX Disaster Recovery, and the HCX plugin inside the vCenter client. If your runbook still references the WAN Opt appliance to squeeze a thin link, or HCX DR as a fallback, rewrite it. For DR, use VMware Live Recovery instead. For replication over constrained links, your bandwidth floor (especially for RAV) now matters more than it used to.

Deploy, pair and build the Service Mesh

Disclaimer: This is a production-change procedure. Validate the target BOM and interoperability, confirm the destination workload domain is healthy, back up source vCenter and any critical VMs, run the HCX precheck, and test the flow with a throwaway VM before you queue real workloads.
  1. Deploy the HCX 9.0 unified manager appliance at the source site and again at the VCF 9 destination, then register each against its local vCenter and assign the VCF license from VCF Operations.
  2. Create the site pairing from the source manager to the destination. The pairing direction sets which appliance acts as the destination, so pair source to destination, not the reverse.
  3. Build a Compute Profile at each site (which clusters, datastores and management networks HCX may use) and a Network Profile for the uplink, management, vMotion and replication networks.
  4. Create the Service Mesh between the paired sites. This deploys the Interconnect (IX) and Network Extension (NE) appliances that carry migration and L2 traffic.
  5. Extend the source VM networks you need to preserve. Network Extension keeps the existing IP addresses and gateways so VMs do not have to be re-addressed at cutover.
  6. Run a health check on the Service Mesh tunnels before queueing anything.

HCX firewall ports (source to destination)

PortPurpose
TCP 443HCX Unified Manager UI and API
UDP 4500IPsec tunnel for the IX and NE appliances
TCP 8123Mobility Agent and Bulk Migration data path
TCP 9443appliance management portal (no VCF SSO here)
# Validate Service Mesh tunnels from the HCX Unified Manager shell
ccli
list                # list deployed IX / NE appliances
go 0                # select the first appliance
hc                  # run the built-in health check

Sizing note: RAV needs 150 Mbps or higher throughput per VM in flight. Size your uplink and concurrency against that floor.

The destination workload domain has to be ready to receive these VMs. If you have not built it yet, that is its own procedure: see deploying a VI workload domain in VCF 9 with VCF Operations before you pair sites.


HCX 9.0 Service MeshOne unified manager per site; IX carries migrations, NE extends L2 and keeps IPsSource siteHCX unified managervCenter + source VMsDestination (VCF 9)HCX unified managerVCF 9 workload domainIPsec (UDP 4500)IX: migrations (8123)NE: L2 extension, keeps IPs
HCX 9.0 uses one unified manager per site; the Service Mesh deploys the IX and NE appliances that carry the move.

Choose the migration type per workload

HCX gives you several migration types and the right one depends on the VM, not on a blanket policy. Cold migration is for powered-off VMs. HCX vMotion moves a single live VM with zero downtime but no parallelism, which makes it the tool for that one finicky database node, not for 300 web servers. Bulk Migration uses vSphere Replication under the hood to move many VMs in parallel on a schedule, with a brief reboot at the switchover. RAV combines the two: parallel and scheduled like Bulk, but with a vMotion-style switchover so there is no guest reboot.

TypeDowntimeParallelBest for
ColdAlready offYesPowered-off and templates
HCX vMotionZeroNo (one at a time)Single sensitive VMs
Bulk MigrationBrief reboot at switchoverYes (high)Large fleets, reboot acceptable
RAVZero (vMotion switchover)YesLarge fleets, no reboot, 150 Mbps+ per VM

My default for a bulk workload-domain move: RAV for anything with an uptime SLA and enough bandwidth, Bulk Migration for the long tail where a short reboot window is acceptable, and plain HCX vMotion reserved for the two or three VMs nobody wants to reboot and nobody wants to schedule. Do not RAV everything by reflex. The 150 Mbps per-VM throughput floor means a wide RAV wave on a thin link will crawl, and you are better off with Bulk there.

Plan with VCF Operations for Networks and migration waves

The genuinely new capability in 9.0 is that planning and execution are joined. You start in VCF Operations for Networks, let it discover application dependencies from observed traffic, and group the workloads into migration waves so that tightly coupled VMs move together rather than getting split across maintenance windows. Those waves then sync to HCX Manager, which automatically creates the Mobility Groups and the destination networks and runs the migration. The value here is not the automation itself, it is that the dependency map comes from real flow data instead of a spreadsheet someone guessed at. The classic migration disaster, an app server moved on Tuesday and its database left behind until Thursday, is exactly what wave planning is there to prevent.

One constraint to note: migrating workloads directly into a vSphere Supervisor cluster (Supervisor Onboarding) is supported, but it is bulk-migration only and aimed at multi-NIC VMs. Plan Supervisor-bound workloads as their own wave, not mixed in with RAV groups.

Wave planning, end to endA dependency map from real flow data, not a guessed spreadsheet1Discoverdependencies2Group intowaves3Sync to HCXMobility Groups4Execute+ cutover5Un-extendnetworksKeeps tightly-coupled VMs together and prevents the app-Tuesday, database-Thursday split.
VCF Operations for Networks builds the waves; HCX executes and cuts over.

Execute, cut over and clean up

  1. Kick off the wave and let the initial replication sync complete. For Bulk and RAV this can run for hours or days without touching the running VMs.
  2. Set the switchover window. RAV keeps replicating delta changes until that window, then performs the vMotion-style cutover; Bulk reboots the VM into the destination.
  3. Validate each migrated VM at the destination: power state, guest networking on the extended segment, VMware Tools, and application reachability.
  4. Once a network is fully migrated, un-extend it and move the gateway to the destination NSX segment. Leaving networks extended forever creates hairpins and asymmetric routing.
  5. Retire the source clusters only after a soak period. Do not decommission on the same day you cut over.

What actually bites people

  • MTU mismatches on the uplink. The IX and NE appliances want jumbo frames end to end. A single switch in the path stuck at 1500 MTU produces slow, flapping tunnels that look like a bandwidth problem but are not.
  • Snapshots block migration. VMs with active snapshots will fail or be skipped. Consolidate before the wave, not during it.
  • Leaving networks extended. Network Extension is a migration aid, not a permanent design. Extended segments hairpin traffic back through the source until you cut the gateway over.
  • RAV on a thin link. Below the 150 Mbps per-VM floor, RAV waves stall. Use Bulk for constrained sites.
  • Licensing drift. An expired VCF license drops HCX to read-only mid-migration. Check it before the wave starts.

If your source is an older VCF instance rather than plain vSphere, the upgrade-versus-migrate decision and its pitfalls are worth reading first: migrating an older VCF to VCF 9 and the seven things that bite you. And if you would rather convert an existing vSphere estate in place instead of moving workloads to a fresh domain, the vSphere to VCF 9 converge workflow is the alternative path.

What I’d Do

For a real workload-domain migration I deploy the HCX 9.0 unified managers, confirm licensing is solid, and let VCF Operations for Networks build the dependency-aware waves rather than trusting a hand-built list. RAV for the SLA-bound tier, Bulk for the long tail, vMotion for the few untouchables. The discipline that separates a clean migration from a messy one is not the tool choice, it is testing the flow with a throwaway VM first and un-extending networks promptly after each wave. What is the largest single wave you have run, and did dependency mapping or raw bandwidth turn out to be your limiter?

References

VCF 9 Series · Part 16 of 36
« Previous: Part 15  |  VCF 9 Complete Guide  |  Next: Part 17 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading