Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF 9 Hybrid Cloud Architecture: Extending the Private Cloud to the Hyperscalers (VCF 9 Series, Part 33)

A reference design for VCF 9 hybrid cloud: how the transit gateway, VPN, HCX and VCF Operations stitch your on-prem instance to Azure VMware Solution, GCVE and OCVS, and the version-skew and egress traps that catch teams in production.

VCF 9 Series · Part 33 of 36

TL;DR · Key Takeaways

  • Hybrid with VCF 9 is not one stretched control plane across clouds. It is three separate planes: consistent infrastructure, workload mobility, and unified operations. Design each one deliberately.
  • The connectivity spine is the NSX transit gateway. VCF 9.1 adds an isolated VPN service inside the transit gateway, multiple external connections, and multiple transit gateways per tenant, so you can route hybrid traffic without bolting on external edge appliances.
  • HCX (now packaged as VCF Operations Workload Mobility) is what actually moves and stretches workloads to Azure VMware Solution, Google Cloud VMware Engine, Oracle Cloud VMware Solution and VMware Cloud on AWS.
  • The hyperscaler VCF services lag your on-prem version. Plan for vSphere 8-era targets while your data center runs 9.x, and design around the gap rather than assuming parity.
  • The two costs that surprise people: HCX Network Extension tromboning, and cloud egress. Both are architecture decisions, not line items you can fix later.
Who this is for: architects and platform teams designing a VCF 9 estate that spans on-prem and a hyperscaler VMware service.  Prerequisites: a working VCF 9 instance, an NSX edge cluster, and at least one target cloud (AVS, GCVE, OCVS or VMware Cloud on AWS) provisioned.

The word “hybrid” does a lot of quiet damage in design workshops. Someone says it, everyone nods, and three people in the room are picturing three different architectures. One imagines a single cluster stretched across a data center and a cloud region. One imagines lift-and-shift migrations that never come back. One imagines a single pane of glass that bills both sides. None of them is wrong, and that is exactly the problem: VCF 9 hybrid cloud is not one capability. It is three planes that you wire up separately, and the failures I see in the field almost always trace back to a team that designed one plane and assumed the other two came free.

What “hybrid” actually buys you in VCF 9

Break the marketing word into the three planes it really means, because each has a different owner, a different failure mode, and a different design artifact.

Consistent infrastructure. The same SDDC stack runs on-prem and in the cloud: vSphere, vSAN, NSX. This is the genuine value of the hyperscaler VCF services. Your operational muscle memory, your NSX segments, your storage policies, all transfer. But “consistent” does not mean “identical version,” and that distinction is where most of the pain lives. More on that below.

Workload mobility. Moving VMs between the two estates, in both directions, with the network coming along for the ride. This is HCX. It is the plane people underestimate, because a one-time migration looks easy in a demo and behaves very differently when you stretch a production L2 segment and leave it stretched for nine months.

Unified operations. One place to see cost, capacity, health and lifecycle across both estates. In VCF 9 this is VCF Operations, which can register VMware Cloud on AWS, Azure VMware Solution, Google Cloud VMware Engine, Oracle Cloud VMware Solution and IBM Cloud as cloud providers and pull their consumption into the same capacity and cost views as your private cloud. This is the plane that justifies the word “hybrid” to your finance team, and the one teams forget to design until the first surprise invoice.

What hybrid actually buys youThree planes, three owners, three failure modes1. Consistent infrastructureSame vSphere, vSAN and NSXon both sides2. Workload mobilityHCX in both directionsthe network follows the VM3. Unified operationsVCF Operationsone cost + capacity view
The honest value is one operating model on both sides plus a clean migration on-ramp, not a stretched control plane.

The reference topology

Here is the shape I draw on the whiteboard for almost every hybrid VCF 9 engagement. On-prem you have your VCF 9 instance with its management domain and one or more VI workload domains. In the cloud you have a provider-managed VCF service. Two data paths connect them: a routed path through the NSX transit gateway and its VPN for normal east-west and north-south traffic, and an HCX path for migration and stretched networking. Sitting above both, not in the data path, is VCF Operations as the single observability and cost plane.

VCF 9 Hybrid Cloud Reference Topology Three planes: consistent infrastructure, workload mobility, unified operations VCF Operations Unified health, capacity and cost across both estates (not in the data path) On-Prem VCF 9 Instance Management Domain SDDC Manager, vCenter, NSX Mgr VI Workload Domain vSphere + vSAN + NSX segments NSX Edge + Transit Gateway VPN, static routes, NAT HCX Connector Hyperscaler VCF Service Provider-Managed SDDC AVS / GCVE / OCVS / VMC on AWS Cloud Workloads Same vSphere / NSX constructs Provider Edge / Gateway ExpressRoute / Interconnect / VPN  HCX Cloud Manager Routed: VPN / interconnect HCX: migration + L2 extend Solid = routed data path · dashed = HCX mobility path · dotted = telemetry to VCF Operations
VCF 9 hybrid reference topology: routed and HCX paths between estates, with VCF Operations as the unified plane above.

The thing to internalize from this picture: the routed path and the HCX path do different jobs and have different lifecycles. The routed path is permanent infrastructure. The HCX network extension should be temporary. Teams that treat the HCX extension as permanent infrastructure are the ones who call me eight months later asking why latency-sensitive apps are misbehaving.

The connectivity layer

VCF 9.1 reworked hybrid connectivity around the NSX transit gateway, and this is the part of the release that genuinely changes how you design. A transit gateway can now attach to multiple external connections, you can run multiple transit gateways per tenant, and there is a VPN service that lives inside the transit gateway itself. In plain terms, you can build isolated, multi-site routing with static routes and custom NAT without standing up external routing appliances to glue it together. The Remote Networks field on each external connection decides which prefixes route where, with a default route option when no remote network is specified.

For the mobility plane, HCX is the tool, now packaged and licensed as VCF Operations Workload Mobility. It pairs an on-prem HCX Connector with an HCX Cloud Manager at the target and gives you the migration types you actually care about: bulk migration for batches, Replication Assisted vMotion (RAV) for large fleets with minimal downtime, and Network Extension to stretch an L2 segment so a VM keeps its IP after it moves. HCX works across the full set of VMware cloud services, so the same playbook applies whether the target is Azure VMware Solution, Google Cloud VMware Engine, Oracle Cloud VMware Solution or VMware Cloud on AWS.

If you want the detailed migration mechanics, I covered them in how to migrate workloads into VCF 9 with HCX and vMotion. This post is about where those paths sit in the overall design, not the click-by-click.


Design matrix: where does each plane really live?

The single most useful artifact from a hybrid design workshop is a matrix that pins each plane to a concrete component, names the owner, and writes down the assumption that has to hold. Skip the matrix and you get hand-waving; build it and the gaps reveal themselves.

DimensionOn-Prem VCF 9Hyperscaler VCF serviceAssumption to validate
Control planeYou own SDDC Manager, vCenter, NSX ManagerProvider-managed; limited admin scopeWhich operations are blocked on the cloud side
Version parityVCF 9.0 / 9.1 on your scheduleOften a vSphere 8-era release; provider-driven cadenceHCX and feature compatibility across the version gap
RoutingNSX transit gateway + VPN (9.1)Provider gateway (ExpressRoute, Interconnect, Direct Connect)MTU, BGP/static design, overlapping CIDRs
MobilityHCX ConnectorHCX Cloud Manager (bulk, RAV, NE)Whether L2 extension is temporary or permanent
OperationsVCF Operations (native)Registered as a cloud provider in VCF OperationsBilling data granularity and refresh interval
Cost driverHardware + VCF licensingPer-host subscription + egressEgress volume from extended segments and DR replication

The rightmost column is the one that earns its keep. Most hybrid designs fail not because a component was wrong but because an unstated assumption (“the cloud side will be on the same version,” “egress will be negligible,” “we can do that operation in the cloud SDDC”) turned out to be false after the contract was signed.

What actually bites in productionAll three are unstated assumptions until they are notVersion skew
Cloud lags on the provider cadence; check the HCX interop matrix
HCX tromboning
Stretched L2 hairpins to the on-prem gateway; enable MON, then retire the stretch
Egress
Outbound dwarfs the compute subscription; model the data flow first
The routed path is permanent; every HCX network extension needs a named retirement date.
Design for version skew, kill tromboning with MON, and model egress before you build.

What actually bites you in production

Version skew is the default, not the exception. Your on-prem estate will be on VCF 9.0 or 9.1 while the hyperscaler service runs a vSphere 8-era build on the provider’s cadence. That gap is normal and manageable, but only if you design for it: check the HCX interoperability matrix for your specific source and target builds before you commit, and do not assume a 9.1-only networking feature is available on the cloud side. In practice the provider sets the upgrade calendar, not you, so treat the cloud version as a fixed input to the design rather than something you can dial in.

HCX network extension tromboning. When you stretch an L2 segment and a migrated VM in the cloud talks to a gateway that still lives on-prem, traffic hairpins back across the link and out again. Two hops over a wide-area link for what should be a local conversation. Mobility Optimized Networking (MON) exists precisely to fix this by letting the cloud side route locally, and you should enable it for any extension that will carry real east-west traffic. The deeper point: a stretched segment is a migration aid with a shelf life. Plan the cutover that moves the gateway and retires the extension. If you cannot name the date you will collapse the stretch, you have not finished the design.

Egress is an architecture decision. Inbound is cheap, outbound is not. A DR pattern that replicates continuously from cloud back to on-prem, or a chatty app split across the link, can generate egress charges that dwarf the compute subscription. Decide deliberately which direction your data flows and how often, and model it before you build, because you cannot refactor your way out of a bad data-gravity decision after the workloads have landed.

My take

For most enterprises the honest reason to go hybrid with VCF 9 is operational consistency and a real migration on-ramp, not bursting. The single stretched control plane that the word “hybrid” conjures up is mostly a mirage; what you actually get, and what is genuinely valuable, is the same operating model on both sides plus a clean path to move workloads between them.

For the operations plane that ties this together, see VCF 9 multi-instance fleet management, and for the on-prem multi-site building block that the hybrid pattern extends, see VCF 9 stretched clusters and multi-site design.

Disclaimer: Hybrid connectivity changes touch production routing and live workloads. Validate the HCX interoperability matrix for your exact source and target builds, confirm there are no overlapping CIDRs, check MTU end to end, back up NSX and edge configuration, run the HCX and network prechecks, and test a non-critical migration wave before committing the fleet.

What I’d Do

Design the three planes separately and on purpose. Make the routed transit gateway path your permanent spine, treat every HCX network extension as a temporary structure with a named retirement date, and stand up VCF Operations as the cost and capacity plane on day one rather than after the first invoice. Write the assumption column of the design matrix in ink and get the provider to confirm each one, especially version cadence and egress. Do that, and “hybrid” stops being a word three people in the room each interpret differently and becomes an architecture you can actually operate. What is the assumption in your own hybrid design that nobody has written down yet?

References

VCF 9 Series · Part 33 of 36
« Previous: Part 32  |  VCF 9 Complete Guide  |  Next: Part 34 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading