Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

vSphere Supervisor and VKS Architecture in VCF 9: The Reference Design (VCF 9 Series, Part 24)

How vSphere Supervisor and VKS fit together in VCF 9, from the control plane and vSphere Zones to networking, storage, and the design defaults you should override before you commit.

VCF 9 Series · Part 24 of 36

TL;DR · Key Takeaways

  • vSphere Supervisor is a Kubernetes control plane embedded in ESX. VKS clusters are tenant Kubernetes clusters it provisions through Cluster API. Both run as VMs on the same ESX substrate.
  • The single biggest design decision is one-zone versus three-zone. Three vSphere Zones give you cluster-level failure domains for the control plane, but they change how namespace resources and storage behave.
  • In VCF 9.1 the only supported networking is VDS (VLAN-backed) or NSX VPCs. NSX Classic Tier-0/Tier-1 segments are out. If your design still assumes Classic, it is already legacy.
  • Antrea is the default CNI; the Foundation Load Balancer covers Layer 4; reach for Avi when you need real Layer 7, WAF, or ingress at scale.
  • Back up the Supervisor and back up the workloads as two separate problems. Velero covers VKS workload clusters, not the Supervisor itself.
Who this is for: architects and platform teams designing a Kubernetes platform on VCF 9, not first-time Kubernetes users.  Prerequisites: a working VI workload domain, an IP plan, and a decision on VDS versus NSX VPC networking.

Most teams meet vSphere Supervisor through a wizard. You tick a workload domain, hand it a few IP ranges, wait, and a Kubernetes control plane appears. That hides a design problem that shows up six months later: the layout you accepted by default is now the thing constraining your availability story, your tenant isolation, and your upgrade window. This part of the series steps back from the enable button and lays out how Supervisor and VKS actually fit together in VCF 9, and which of those defaults you should override before you commit.

The layered picture: what runs where

Supervisor turns a vSphere cluster into a Kubernetes platform by running the control plane on the hosts themselves rather than bolting an external cluster on the side. vCenter still manages the lifecycle. The Supervisor sits on top of the SDDC layer: ESX for compute, NSX or vSphere Distributed Switch for networking, and vSAN or another shared datastore for persistent volumes. VKS clusters sit on top of the Supervisor. Everything you provision, from a stand-alone VM through the VM Service to a full VKS cluster, ends up as VMs scheduled by DRS across those same ESX hosts.

vSphere Supervisor and VKS on VCF 9 One ESX substrate, three logical layers, managed by vCenter vCenter / VCF Lifecycle and placement (DRS/HA) vSphere Namespaces define tenancy and quotas VKS tenant clusters Conformant Kubernetes. Control plane + worker node VMs. Antrea CNI by default. Cluster A (prod) Cluster B (dev) VM Service VMs + vSphere Pods vSphere Supervisor (the platform control plane) Control plane VMs 1 or 3 (HA) VKS + Cluster API provisions clusters VM Service declarative VMs Spherelet: kubelet ported to each ESX host CRX: paravirtual Linux kernel for vSphere Pods SDDC layer (the substrate) ESX compute min 3 hosts / zone VDS or NSX VPC CNI overlay on top vSAN / shared storage PVs via CSI
vSphere Supervisor and VKS share one ESX substrate. The distinction is whether a VM belongs to the platform control plane or to a tenant cluster.

The components worth knowing by name, because they show up in logs and support cases: the Supervisor control plane VMs (one, or three for HA), the VKS and Cluster API modules that provision tenant clusters, the VM Service that deploys stand-alone and cluster node VMs, the Spherelet (a kubelet ported natively to ESX so each host joins the Kubernetes cluster), and CRX, a paravirtualized Linux kernel that lets vSphere Pods boot nearly as fast as plain containers while keeping a VM-grade isolation boundary. When you deploy three control plane VMs, each gets its own IP, one holds a floating IP, and a fifth address is reserved for patching. Size your management subnet for five, not three. People miss that and run out of addresses on the first upgrade.

The decision that shapes everything: one zone or three

A vSphere Zone maps to one vSphere cluster you treat as an independent failure domain. You can activate a Supervisor on a single zone or on three. On a single-zone Supervisor the control plane VMs all run in one cluster and you get cluster-level HA from vSphere HA restarting them. On a three-zone Supervisor each control plane VM lands in a different zone, so a whole cluster can fail and the platform control plane survives. Three-zone is mandatory if you want that protection, and VCF 9.1 pushes it as the modern standard for native Kubernetes HA, explicitly in preference to stretching a single metro cluster across two sites where split-brain is a real risk.

The catch that nobody mentions in the demo: on a three-zone Supervisor, the resources a vSphere Namespace consumes are drawn equally from all three underlying clusters. Dedicate 300 MHz of CPU to a namespace and 100 MHz comes from each zone. That sounds neat until you have asymmetric clusters or a zone running hot, because your headroom math is now a three-way constraint, not a single pool. A namespace needs at least one zone and can span up to three. Decide deliberately which namespaces are pinned to a single zone for performance and which span all three for availability. The default of letting everything span everything is rarely what a serious workload wants.

Design dimensionSingle-Zone SupervisorThree-Zone Supervisor
Failure domain protectedHost-level (vSphere HA in one cluster)Cluster-level (control plane survives a full cluster loss)
Control plane VM placementAll three in one clusterOne VM per zone, HA mandatory
Minimum hosts3 (vSAN + HA quorum)9 (3 per zone for quorum)
Namespace resource sourceSingle cluster poolSplit equally across all three clusters
Node pool HALimited to one failure domainNode pools spread across zones (VCF 9.1)
Best fitSingle-site, dev/test, smaller footprintsProduction platforms needing cluster-level resilience

One zone or three: where the control plane landsThree-zone survives a full cluster lossSingle-zoneOne cluster (one failure domain)CP VMCP VMCP VMHost-level HA, minimum 3 hostsThree-zoneZone 1CP VMZone 2CP VMZone 3CP VMCluster-level HA, minimum 9 hosts (3/zone)On three-zone, a namespace draws resources equally from all three clusters.
Pick the zone model up front; it is expensive to change after enablement.

Networking: VDS or NSX VPC, and nothing else in 9.1

This is the part of the design most likely to be wrong if you carried a vSphere 8 pattern forward. In VCF 9.1, the only supported networking modes for Supervisor and VKS are VDS (VLAN-backed) and NSX VPCs. NSX Classic with hand-built Tier-0 and Tier-1 segments was supported up to VCF 9.0.2 and is gone from 9.1 onward. If your reference design still draws T0/T1 topologies for the Supervisor, it is already legacy and you will be reworking it at upgrade time.

VPCs are the construct to understand. They behave like a public-cloud VPC: tenant isolation, self-service, and automatic handling of routing, NAT, and subnets, so a platform team is not hand-cutting segments for every new tenant. VDS networking is simpler and VLAN-backed, which suits smaller or lab footprints and teams that do not want NSX in the path. For the container network itself, Antrea is the default CNI and has the deepest integration through the Antrea-NSX adapter. VKS is CNCF-conformant, so you can run Calico or Cilium inside guest clusters, and bring-your-own-CNI has been allowed since VKS 3.6. Just know that you trade native NSX visibility for that flexibility. In practice I keep Antrea unless a workload has a hard, specific dependency on another CNI. Swapping it out to match a runbook from another platform is a cost you pay forever in troubleshooting.

Load balancing is its own decision. The Foundation Load Balancer is the default Layer 4 balancer for VDS networking and is fully supported, but it is Layer 4. For real Layer 7 ingress, WAF, or DNS integration, Avi Load Balancer is the answer, with the Avi Kubernetes Operator acting as the ingress controller. The NSX-native load balancer is also Layer 4 only. You can bypass all of it with in-guest kube-vip or metallb, but then you lose the centralized analytics and IPAM the VCF fabric gives you, and you own that lifecycle yourself. For a platform you intend to operate at scale, that is usually a false economy.

Storage and persistence, where three zones get awkward

Stateless workloads pull from standard vSAN or block storage policies, and in VCF 9.1 storage quotas are enforced directly at the vSphere Namespace level, which is the right place for tenant control. The vSphere CSI driver handles dynamic provisioning and reclaim: set a PersistentVolume reclaim policy of Delete and the underlying VMDK is removed from vSAN automatically when the claim is deleted. That part is clean.

Where it gets awkward is ReadWriteMany and three-zone resilience. Native RWX out of the box needs vSAN File Services. No vSAN, or no File Services, and you are deploying something like Portworx inside the guest clusters to provide RWX over your existing block storage. And the honest answer on three zones: standard ReadWriteOnce block volumes do not synchronously replicate across zones, because the latency penalty is too high. Storage-layer cross-zone HA is not how this works. You handle availability at the application or database layer, with database replication or a service like the vSAN Data Persistence Platform for stateful services. If your design assumes a block PVC will simply follow a pod to another zone intact, fix that assumption now, not during an incident.

Scale, backup, and the lines you should not blur

The 9.1 re-engineered Supervisor lifts the ceilings hard: up to 500 clusters per Supervisor, host capacity doubled to 5,000, parallel upgrade throughput raised from 64 to 256 clusters at once, and provisioning roughly 70% faster through linked-clone technology. Most shops will not approach those numbers, but the parallel-upgrade jump is the one that matters operationally, because cluster fleets that took a weekend to roll now fit in a maintenance window.

Backup is where I see the cleanest designs and the messiest ones diverge. Keep a hard line between infrastructure and applications. The Supervisor is infrastructure and is not backed up with Velero. VKS workload clusters and their Kubernetes objects are applications and are backed up with Velero, which Broadcom recently contributed to the CNCF as a Sandbox project. Treat those as two separate runbooks with two separate owners. Teams that try to make one tool cover both end up with a backup that restores neither cleanly. For broader context on how the platform sits inside a workload domain, the earlier VI workload domain deployment walkthrough covers the layer this all rests on.

Design caveat: validate every number here against your target BOM and the VCF 9.1 configuration maximums before you commit a layout. Zone count, IP block sizing, and CNI choice are expensive to change after enablement, so confirm them in a design review and test the enablement in a non-production instance first.
Draw a hard line: infra vs app backupTwo runbooks, two owners; one tool for both restores neither cleanlyInfrastructure: SupervisorThe platform control planeNot backed up with VeleroRebuilt, not restoredApplications: VKS workloadsKubernetes objects + dataBacked up with VeleroTenant responsibilityKeep them as two separate documents with two separate owners.
Back up the Supervisor as infrastructure and VKS workloads as applications, never with one tool.

What I’d Do

For a production platform that has to survive more than a host failure, design for three vSphere Zones from day one, size the management subnet for the full five-address control plane, and standardize on NSX VPCs with Antrea so tenant onboarding is self-service rather than a ticket. Keep single-zone for genuinely single-site or dev/test footprints where the extra six hosts are not justified. Pick Avi the moment Layer 7 ingress is a real requirement instead of retrofitting it later. And write your Supervisor backup and your VKS workload backup as two documents, because they are two different problems. Get the zone model and the networking mode right up front and almost everything else is tunable later. Which of these defaults is your current environment quietly stuck with?

References

VCF 9 Series · Part 24 of 36
« Previous: Part 23  |  VCF 9 Complete Guide  |  Next: Part 25 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading