Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VKS Architecture: Supervisor, Namespaces and Workload Clusters (VKS Series, Part 2)

The Supervisor, vSphere Namespaces and workload clusters look alike and behave nothing alike. Here is how the three layers fit together, and where tenancy actually lives.

VKS Architecture: Supervisor, Namespaces & Clusters
VKS Series · Part 2 of 17

TL;DR · Key Takeaways

  • Three layers, kept distinct: the Supervisor is the platform control plane, vSphere Namespaces are the tenancy boundary, and workload clusters are the conformant Kubernetes your developers use.
  • The Supervisor runs as control plane VMs plus a spherelet on every ESX host. It is management plane, not a place to run your apps.
  • A vSphere Namespace carries the quota, storage policies, permissions and VM classes a team is allowed to consume. It is the unit of multi-tenancy.
  • Workload clusters are defined with ClusterClass and the Cluster API. Their control plane and workers are ordinary VMs, placed and protected by DRS and vSphere HA.
  • Size the management subnet for five control-plane addresses, not three. The floating IP and the patch IP are the two people forget.
Who this is for: anyone who will design, operate or troubleshoot VKS and needs the moving parts named correctly.  Prerequisites: Part 1, and a working mental model of vSphere clusters and vCenter.

Most architecture confusion with VKS comes from blurring three things that look similar and behave nothing alike: the Supervisor, the vSphere Namespace, and the workload cluster. Mix them up and you will put a quota in the wrong place, grant a developer rights they should not have, or, the classic, try to deploy a production app straight onto the management plane and wonder why it feels fragile. This part draws the three layers cleanly and names the components that show up in your logs and support cases.

The Supervisor: a Kubernetes control plane inside vSphere

When you enable the Supervisor on a vSphere cluster, vSphere embeds a Kubernetes control plane into the cluster itself rather than bolting an external one on the side. Three Supervisor control plane VMs run the Kubernetes API server and controllers, and a lightweight agent called the spherelet, a kubelet ported natively to ESX, runs on each host so the hosts themselves act as worker nodes for Supervisor-level workloads. A second component, CRX, is a paravirtualized Linux kernel that lets vSphere Pods boot almost as fast as plain containers while keeping a VM-grade isolation boundary. You do not have to operate these by hand, but you will see their names when something breaks.

The Supervisor is a management plane. Its job is to host vSphere Namespaces and to run VKS, which provisions the workload clusters that run your applications. Treating it as a place to deploy production workloads is the most common early mistake, it is infrastructure, not your app platform. One sizing detail that bites people on day one: when three control plane VMs deploy, each takes an IP, one holds a floating IP, and a fifth address is reserved for patching. Size the management subnet for five, not three, or you run out on the first upgrade.

Supervisor → namespaces → clusters One Supervisor hosts many tenant namespaces; each namespace holds workload clusters vSphere Supervisor 3 control plane VMs (5 IPs) · spherelet on every ESX host · VKS + Cluster API vSphere Namespace: team-a quota · storage policy · VM classes · RBAC VKS cluster: prod 3 control plane VMs (HA) worker node pool × N Antrea CNI VKS cluster: dev 1 control plane VM 2 worker VMs throwaway, low quota vSphere Namespace: team-b its own quota, policies and permissions VKS cluster: gpu-ml 3 control plane VMs worker pool on a vGPU VM class NVIDIA GPU Operator (Part 14) fully isolated from team-a: separate quota, RBAC and network one tenant, one namespace
The Supervisor hosts namespaces; namespaces hold workload clusters; cluster nodes are VMs on the same ESX hosts.

vSphere Namespaces: where tenancy and the responsibility line live

A vSphere Namespace is the boundary the vSphere administrator hands to a team. It carries the resource quotas, the storage policies, the permissions and the VM classes the team is allowed to consume. When someone asks for a VKS cluster, that cluster is created inside a namespace and inherits its limits. So the namespace is the natural unit of multi-tenancy: one team, one namespace, with its own ceiling on CPU, memory and storage, and its own RBAC.

It is also where the division of labour is drawn, and this two-tier model recurs through the entire series. The vSphere administrator sets up the namespace and decides what is permitted inside it, which VM classes, which storage policies, how much quota. The DevOps user with access to that namespace then provisions and manages clusters within those guardrails, and grants their own developers access to the resulting clusters via Kubernetes RBAC. If you remember one thing: namespace permissions decide who can run a cluster; cluster RBAC decides who can use it. We come back to exactly this in Part 10.


Workload clusters: ClusterClass, and nodes that are just VMs

A workload cluster is the conformant Kubernetes cluster your developers actually deploy to. In current VKS you define it with the standard Cluster API: a Cluster object references a ClusterClass, and VKS ships built-in, versioned classes (the builtin-generic line) that encode a tested topology. You declare what you want, the Kubernetes version, node counts, VM classes, and the service reconciles it into a running cluster. This is a real shift from the deprecated TanzuKubernetesCluster API: ClusterClass makes clusters templated and consistent, and upgrades flow through the class rather than through hand-edited manifests. Part 4 walks the provisioning workflow end to end.

Inside the cluster, the control plane and workers are themselves VMs. The control plane runs as a single node for throwaway clusters or three nodes for HA, and workers are grouped into node pools (machine deployments), each pool drawing its size from a VM class. Scale a pool and VKS creates or removes worker VMs; upgrade and it replaces them in a rolling fashion. The quiet strength here is that a VKS worker node is not a black box, it is a VM your existing tooling sees, DRS places, and vSphere HA protects. The trade-off is that cluster shape is bounded by what your VM classes and namespace quota allow, which is exactly why Part 5 treats sizing as a real decision.

ComponentWhat it isLayer
Supervisor control plane VMsRun the K8s API server and controllers (1 or 3)Management plane
Sphereletkubelet ported to each ESX hostManagement plane
CRXParavirtual kernel for fast, isolated vSphere PodsManagement plane
vSphere NamespaceQuota, policy, RBAC and VM-class boundaryTenancy
Cluster + ClusterClassDeclarative definition of a workload clusterWorkload cluster
Node pool (machine deployment)A group of worker VMs sized from one VM classWorkload cluster
Don’t run apps on the Supervisor: the Supervisor can technically run vSphere Pods, but it is your platform control plane. Production workloads belong in VKS workload clusters, which you can scale, upgrade and blow away without touching the management plane. Keeping that line clean is the difference between a platform you can operate and one you are afraid to touch.

How a cluster request flows through the three layers

The layers click into place once you trace a single request through them. A platform engineer asks for a cluster by applying a Cluster object into a vSphere Namespace. The Supervisor, acting as the management plane, accepts it, and VKS with Cluster API reconciles that intent: it checks the request against the namespace’s quota and permitted VM classes, then provisions control plane and worker VMs sized from those classes, placed by vSphere across the hosts. Storage comes from the namespace’s storage policies, networking from NSX or VDS. When the VMs are up and the cluster is healthy, the engineer pulls a kubeconfig and hands scoped access to developers. Every step touched a different layer: the namespace enforced the limits, the Supervisor and VKS did the reconciling, and the workload cluster is what the developers actually use.

Seeing it as a flow rather than a static diagram explains where things go wrong. A request that fails on quota is a namespace problem. A request that provisions but never goes ready is a Supervisor, storage or network problem. A cluster that is up but that a developer cannot deploy to is an RBAC problem in the workload cluster. The layer that owns the symptom is the layer to debug, and conflating them is what sends people hunting in the wrong place.

Where vSphere Pods fit, and why you rarely use them

One source of confusion deserves clearing up: the Supervisor can also run vSphere Pods directly, containers that run as lightweight VMs on the hosts via the CRX runtime, without a workload cluster at all. They exist, and they are clever, near-container start times with VM-grade isolation, but for most teams they are not the path you want for applications. Running on the Supervisor means running on the management plane, and it ties your workloads to the platform’s lifecycle rather than to a cluster you can scale, upgrade and delete independently. The mainstream pattern, and the one this series follows, is to run applications in VKS workload clusters and treat the Supervisor as infrastructure.

Knowing vSphere Pods exist matters mostly so you recognise them in the documentation and do not confuse them with VKS clusters. They are a niche tool for specific platform-level cases, not the general answer for shipping workloads. If you find yourself reaching for them for ordinary applications, step back: that is almost always a sign the work belongs in a workload cluster, where it gets its own governed, disposable lifecycle.

What I’d Do

Internalise the three layers before you touch anything else, because almost every later decision sits on one of them. Treat the Supervisor as sacred infrastructure and never schedule business workloads on it. Model one namespace per tenant or per team, and decide its quota, storage policies and VM classes deliberately rather than accepting defaults, that namespace is your governance surface. Then keep workload clusters cheap and disposable: small for dev, three control plane nodes for anything that matters, and always remember the nodes are just VMs your existing operations already know how to handle. Get the layering right and the rest of this series is detail. So, looking at your own environment: is your tenancy actually modelled at the namespace level, or is everything quietly sharing one big namespace and one big cluster?

References

VKS Series · Part 2 of 17
« Prev: Part 1  |  VKS Complete Guide  |  Next: Part 3 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading