Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF-to-VCF Migration: The Parallel-Instance Reference Architecture for VCF 9 (VCF 9 Series, Part 17)

How to move an older VCF estate onto VCF 9 by building a parallel instance and migrating workloads across with HCX Service Mesh and Replication Assisted vMotion, including sizing, waves, cutover, and the RAV restrictions that derail migrations.

VCF 9 Series · Part 17 of 36

TL;DR · Key Takeaways

  • VCF-to-VCF migration means standing up a fresh VCF 9 instance beside your old one and moving workloads across with HCX, not upgrading the old bits in place.
  • The parallel-instance pattern decouples the platform rebuild from the workload move, which is the single biggest reason it beats an in-place upgrade on a deeply customised brownfield.
  • The data path is an HCX Service Mesh with Network Extension for Layer 2 adjacency, and Replication Assisted vMotion (RAV) for large, scheduled, low-downtime waves.
  • RAV needs 150 Mbps or higher per path, VM hardware version 9+, and it will not touch vVOL datastores, vNVMe controllers, RDM physical mode, or vCenter and NSX appliances.
  • Plan the bridge bandwidth and the wave schedule first. The migration almost never fails on compute, it fails on the network underlay and the cutover window.
Who this is for: architects and consultants planning a side-by-side move from an older VCF instance (4.x or 5.x) onto a new VCF 9 fleet.  Prerequisites: a target VCF 9 management domain already deployed, HCX entitlement, and routed connectivity between source and target sites.

There are two ways to get an aging VMware Cloud Foundation estate onto version 9, and they are not equivalent. You can upgrade the existing instance through the supported path, which Part 15 covered along with the seven things that bite you there. Or you can build a brand new VCF 9 instance next to the old one and migrate the workloads across. This post is about the second approach, the parallel-instance migration, and why it is often the right call for a brownfield that has accumulated years of drift.

The reason to reach for it is simple. An in-place upgrade carries forward every quirk of the old deployment: the hand-edited NSX config, the cluster that was commissioned before someone understood the storage policy, the SDDC Manager state that nobody fully trusts. A parallel build lets you design the target clean, validate it empty, and then move only the things you actually want to keep. You trade hardware and effort for a controlled, reversible cutover. That trade is worth it more often than the upgrade guides admit.

Why parallel instance, and when not to

Recommend the parallel-instance pattern when the source is heavily customised, when you are also changing hardware generation or storage architecture (OSA to ESA is a common pairing), or when the business needs a clean rollback story. Standing up a separate VCF 9 instance gives you a known-good target and a migration you can pause between waves. If something looks wrong after the first wave, the source is still running and you have lost nothing.

Do not reach for it when the source is small, recent, and already close to VCF 9 specs. If you have a tidy VCF 5.x instance on hardware you intend to keep, the in-place upgrade is faster and cheaper, and the parallel build is just expensive ceremony. The other disqualifier is hardware: a parallel instance needs enough spare capacity to run the target management domain plus the first migration wave concurrently with the source. If you cannot free up four hosts for the new management domain (the VCF 9 minimum) without evacuating production first, the parallel approach loses its main advantage.

The assumption to validate before you commit: confirm there is routed IP connectivity between the source and target sites with the bandwidth to carry replication traffic, and confirm your HCX entitlement is current. Everything downstream depends on those two facts.

The target: fleet, instance, and a shared operations plane

VCF 9 changes the shape of the target you are building. The new instance is deployed with the VCF Installer appliance, which replaces Cloud Builder and the old Deployment Parameters Workbook. A VCF 9 instance is the familiar stack: vCenter, NSX, VCF Operations, and VCF Automation, with vSAN delivering the full SDDC. What is new is the Fleet. Multiple VCF instances can sit in one Fleet and share a common VCF Operations and VCF Automation plane, which is exactly the construct you want during a migration: the old and the new instance can both be visible from a single Operations console while you move between them.

Size the target management domain for production from the start. A Simple (single node) model deploys a minimum of seven appliances and is fine for a lab or a proof of concept. For anything you will run in anger, use the High Availability model: a minimum of thirteen appliances, with three NSX Managers, three VCF Operations nodes, and three VCF Automation nodes. Building the target as Simple and promising to harden it later is a trap. You will be mid-migration when you discover you cannot convert the model without disruption.

One licensing note that catches people. A new VCF 9 instance deploys in evaluation mode and stays fully functional for 90 days, after which it must be licensed with a file through the Broadcom console. Map your migration timeline against that 90-day clock. A migration that drifts past it because of a procurement delay is an avoidable outage waiting to happen.

The migration data path

The bridge between the two instances is an HCX Service Mesh. You pair the source HCX with the target, build a Compute Profile on each side, and deploy the Service Mesh, which stands up the Interconnect (IX) and Network Extension (NE) appliances that carry the traffic. In VCF 9 the HCX appliances are deployed and lifecycle-managed through VCF Operations, which is a real improvement over wiring it all by hand. The diagram below shows the shape of the connection.

Parallel-Instance VCF-to-VCF Migration Source VCF (4.x / 5.x) to target VCF 9, bridged by an HCX Service Mesh VCF Operations Fleet Plane Single operations and lifecycle view across both instances during the move Source VCF Instance Legacy, still running production vCenter · NSX · SDDC Manager Workload clusters (the VMs to move) grouped into migration waves HCX IX + NE appliances (source) Target VCF 9 Instance New, clean, HA management domain vCenter 9 · NSX · VCF Operations VI workload domains (destination) extended segments receive the VMs HCX IX + NE appliances (target) HCX Service Mesh Network Extension (L2) RAV migration path 150 Mbps or higher per path scheduled switchover window
Source and target run concurrently; HCX carries the workloads while a shared VCF Operations plane spans both instances.

Two services do the real work. Network Extension stretches the source network to the target so a migrated VM stays Layer 2 adjacent to its peers and keeps its IP and MAC address. That is what lets you move an application in pieces without re-IPing it. HCX 9.0 extended this to NSX VPC environments, so VPC subnets are now valid extension targets, not just NSX segments and VLAN port groups. The second service is Replication Assisted vMotion, and it is the one you will lean on for bulk movement.

RAV is built for exactly this job: large, parallel, scheduled, low-downtime migrations. It does an initial full sync of the VM disks to the target, keeps replicating the deltas, and then waits for a switchover window you define. When the window opens it runs a short delta vMotion to flip the VM live. The clever part is the asymmetry: replication moves the bulk of the data ahead of time over hours or days, so the actual switchover is a quick delta and a power-on. Live switchovers run serially even though replication runs concurrently, so a wave of fifty VMs replicates in parallel but cuts over one after another. Plan the window accordingly.

RAV: replicate in parallel, switch over in seriesBulk data moves ahead of time; the cutover is a quick delta per VMReplication(parallel)Full sync + delta replication – hours to days, all VMs at onceswitchover windowSwitchover(serial)VM1VM2VM3delta vMotion, one at a timeA wave replicates in parallel but cuts over one VM after another; size the window for serial switchovers.
RAV front-loads the data so the switchover is just a quick delta and a power-on, run serially.

Sizing the bridge and the waves

RAV requires 150 Mbps or higher of throughput per path, and that is a floor, not a target. The realistic migration rate depends on bandwidth, latency, and the read speed of the source storage, because RAV rides vSphere Replication underneath. The math that matters is total dataset divided by sustained replication rate, plus a margin for delta churn on busy VMs. A 40 TB estate over a single 150 Mbps path is weeks of replication. If your move has a deadline, provision more bandwidth or more parallel Service Mesh paths and measure the real rate with a pilot wave before you publish a schedule.

Group the migration into waves by application affinity, not by convenience. Everything that talks to everything else on a low-latency path should move in the same wave, or you will split a chatty application across the stretched network and pay the latency tax until both halves land. Keep the most fragile and most critical systems for a late wave, once the target and the process have proven themselves on lower-stakes workloads. For the detailed mechanics of running HCX and vMotion migrations themselves, see the workflow walkthrough in how to migrate workloads into VCF 9 with HCX and vMotion.


Cutover sequence and decommission

Disclaimer: this is a production-change procedure. Validate the target BOM against the VCF 9 compatibility guide, confirm HCX interoperability between the source and target versions, back up your source workloads, run HCX pre-checks, and prove the path with a non-critical pilot wave before committing production systems.
  1. Deploy and license-track the target VCF 9 instance, and join it to the Fleet so both instances share one VCF Operations view.
  2. Pair HCX, build the Compute Profiles, and deploy the Service Mesh. Confirm the Interconnect tunnels are up and the Network Extension and RAV services report healthy.
  3. Extend the source networks the first wave needs, so migrated VMs stay Layer 2 adjacent on the target.
  4. Run a pilot wave of low-risk VMs end to end. Measure the real replication rate and the switchover duration, then size the remaining waves from those numbers.
  5. Replicate each production wave ahead of its window, then switch over inside the maintenance window. Validate the application on the target before moving on.
  6. Once a stretched network has no VMs left on the source side, un-extend it and let it route natively on the target. This is the step teams forget, and a forgotten extension is a single point of failure that quietly outlives the migration.
  7. When the last wave is verified, decommission the source instance and reclaim its hardware. Do not pull it the same day you finish; keep it powered off but intact for a defined rollback window.

The decision of which path to take in the first place, converge, import, or build fresh, is its own design exercise. If you have not framed that yet, work through VCF 9 adoption paths and when to converge, import or start fresh before you commit hardware to a parallel build.

What actually breaks, and what to validate

RAV has a restriction list, and migrations stall when nobody read it. RAV will not migrate VMs on vVOL datastores, VMs using virtual NVMe (vNVMe) controllers, or VMs with Raw Device Mappings in physical compatibility mode. It will not move VMs with vSphere VM Encryption or Virtualization Based Security enabled, and it cannot migrate VMware software appliances like vCenter Server or NSX Manager (those are rebuilt on the target, not moved). The VM hardware version must be 9 or higher, and the architecture must be x86. Scan the source for these attributes before you build wave plans, because finding a vVOL-backed database in the middle of a switchover window is the kind of surprise that ends a maintenance weekend.

One behaviour that confuses backup teams: RAV creates two folders at the destination, one for the VM definition and one for the disks. That is normal and has no functional impact, but backup tools that expect a single VM folder can choke on it. If that affects your tooling, consolidate the folders with a Storage vMotion after the VM lands. And on the network side, the Network Extension migration interface does not display port groups that are VLAN trunks, so a trunked port group you assumed you could extend will simply not appear. Catch that in design, not in execution.

For the upgrade-specific pitfalls that overlap with any move onto VCF 9, the companion piece on migrating older VCF to VCF 9 covers the seven that bite most often.

RAV will not migrate theseScan the source for these attributes before you build wave plansvVOL datastoresvNVMe controllersRDM in physical compatibility modeVM Encryption or VBS enabledSoftware appliances (vCenter, NSX)VM hardware version below 9 / non-x86Also: RAV creates two folders (VM + disks); Network Extension hides VLAN-trunk port groups.
Find these before a switchover window, not during one.

What I’d Do

On a brownfield with real history, I build parallel almost every time. The clean target and the reversible cutover are worth the hardware, and the shared Fleet plane in VCF 9 makes running two instances side by side far less painful than it was on older releases. The whole project lives or dies on two numbers you should nail down before anything else: the sustained replication rate across your bridge, and the length of your switchover windows. Get a pilot wave through early, measure both for real, and let those numbers drive the plan rather than a slide that assumes the 150 Mbps floor is your ceiling. Which is the bigger constraint in your environment right now, the bandwidth between sites or the maintenance windows you can get signed off?

References

VCF 9 Series · Part 17 of 36
« Previous: Part 16  |  VCF 9 Complete Guide  |  Next: Part 18 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading