TL;DR · Key Takeaways
- Node size in VKS comes from a VM class, not a free-form number. The class fixes the CPU and memory of every node built from it.
- guaranteed classes reserve their resources; best-effort classes do not. Use guaranteed for production nodes you cannot afford to have starved under contention.
- Workers live in node pools (machine deployments). Different pools can use different VM classes, which is how one cluster mixes general, memory-heavy and GPU nodes.
- Control plane is one node for throwaway clusters or three for HA. Three is the right default for anything you care about.
- Size for the workload’s real shape and the namespace quota, then let the autoscaler absorb peaks. Oversized best-effort fleets waste quota and still get starved.
The manifest in Part 4 quietly assumed you already knew how big to make things. You do not get to type arbitrary CPU and memory for a VKS node; you pick a VM class, and the class sets the shape. That constraint is a feature, it keeps clusters consistent and quotas enforceable, but it means sizing is a decision you make deliberately, up front, not something you nudge later without rolling nodes. This part gives you a way to reason about it that holds up in production.
VM classes: where node size is actually decided
A VM class is a named compute profile, so much CPU, so much memory, with a reservation policy. Every node built from that class gets exactly that shape. VKS ships default classes across the best-effort and guaranteed families, from small through extra-large, and a vSphere administrator can create custom classes, including the vGPU classes we cover in Part 14. The distinction that matters under load: guaranteed classes reserve their CPU and memory, so the node always has what it was promised; best-effort classes do not, so they can be starved when the hosts are busy. The classes available to a cluster are the ones the administrator has added to its vSphere Namespace, the two-tier model from Part 2 again, so if the class you want is not there, that is a namespace task, not a manifest tweak.
| VM class family | Reservation | Use for |
|---|---|---|
| guaranteed-* | Reserves full CPU + memory | Production workers; anything latency-sensitive |
| best-effort-* | No reservation; shares spare capacity | Dev/test, bursty or tolerant workloads |
| custom vGPU class | Adds an NVIDIA vGPU / passthrough profile | GPU and AI worker pools (Part 14) |
Node pools and the one-or-three control plane
Workers live in node pools, expressed as machine deployments in the manifest. Each pool has its own VM class and replica count, so a single cluster can hold a general-purpose pool on a balanced class, a memory-heavy pool for caches or databases, and a GPU pool for inference, all governed together. Node pools are also the unit the autoscaler operates on (Part 9), so designing them well early pays off: adding a differently shaped pool later is easy, but reshaping a pool in place means rolling its nodes.
The control plane is the other axis. A single control plane node is fine for ephemeral dev and test where you can tolerate losing the cluster if its one node fails. For anything with a real workload, run three so the cluster survives a node failure and an upgrade without an API outage. The cost is three control-plane VMs your namespace quota must accommodate. Treat three as the default and one as the deliberate exception.
A sizing reference to start from
| Cluster shape | Control plane | Worker pool | Use for |
|---|---|---|---|
| Dev / sandbox | 1 node | 2 × best-effort-medium | Experiments, CI, throwaway clusters |
| Team prod (small) | 3 nodes | 3+ × guaranteed-large | Steady microservice workloads |
| Memory-heavy | 3 nodes | pool on a high-memory class | Caches, in-memory data, JVM-heavy apps |
| GPU / AI | 3 nodes | pool on a custom vGPU class | Inference and training (Part 14) |
Treat this as a starting point, not gospel. The exact class names and amounts depend on what your administrator published into the namespace, and the right replica counts depend on your actual workload. The principle that holds everywhere: size for the real CPU and memory profile and for the quota you have, then let the autoscaler absorb peaks rather than over-provisioning every pool for the worst case.
Reserved versus best-effort under real contention
The guaranteed-versus-best-effort choice sounds academic until the hosts get busy, and then it decides whether your workload runs or stalls. A guaranteed VM class reserves its CPU and memory in vSphere, so the node always has the resources it was promised regardless of what else is competing for the host. A best-effort class reserves nothing; it consumes spare capacity, and when there is no spare it is the first to be squeezed. On a quiet cluster the two behave identically, which is exactly why the difference is invisible in testing and painful in production. The first time a noisy neighbour saturates a host, your best-effort database node is the one that starts missing its scheduling deadlines.
My rule of thumb: production worker pools and anything stateful or latency-sensitive get guaranteed classes; dev, test, CI and genuinely elastic, tolerant workloads can use best-effort to pack more density onto the same hardware. The cost of guaranteed is that reserved capacity cannot be overcommitted, so you fit fewer nodes per host. That is the right trade for workloads that must not be starved, and the wrong one for a fleet of throwaway clusters where density matters more than guarantees. Decide per pool, not per cluster, because a single cluster can legitimately mix a guaranteed production pool with a best-effort batch pool.
Sizing the control plane, not just the workers
Worker sizing gets all the attention, but an undersized control plane is a slower, sneakier problem. The control plane runs the API server, the controllers and etcd, and etcd in particular is sensitive to two things: memory pressure and disk latency. A control plane node starved of memory will see the API server get sluggish; an etcd backed by slow storage will see write latency climb until leader elections start flapping under load. On a cluster with a lot of objects, many namespaces, large configmaps, frequent deployments, controllers hammering the API, the control plane works harder than the idle demo suggests.
So do not reflexively give the control plane the smallest class. Match it to the cluster’s object count and churn, give etcd fast storage through the storage policy you assign, and on anything production run three control plane nodes so a single failure or a rolling upgrade never costs you the API. The symptom of getting this wrong is subtle: not an outage, but a cluster that feels laggy under deployment storms and whose kubectl calls intermittently time out. By the time you notice, it is a live cluster and resizing the control plane means rolling it.
How node pools land on the underlying hosts
Worker nodes are VMs, so they are placed by vSphere DRS across the hosts in the zone, and that has two consequences worth designing for. First, DRS spreads and balances the node VMs, which is good, but Kubernetes does not automatically know that two of its nodes might be on the same physical host. If you need real fault isolation, two replicas of a critical pod genuinely landing on different hosts, you lean on Kubernetes anti-affinity at the pod level and, where it matters, on host-level placement so the node VMs themselves are spread. Second, a node pool drawn from one VM class will produce identically shaped VMs, which keeps DRS placement and capacity planning predictable.
The design point is that VKS gives you two layers of scheduling, Kubernetes scheduling pods onto nodes, and vSphere placing node VMs onto hosts, and they do not coordinate by default. For most workloads that is fine. For the ones where a single host failure must not take out both replicas of something, you have to say so explicitly at the Kubernetes layer with anti-affinity and topology spread constraints, and verify the node VMs are not quietly stacked on one host. This is exactly the kind of thing that looks fine in steady state and bites during a host failure, so design for it before you need it.
What I’d Do
For production, I default to a three-node control plane and guaranteed worker classes, then keep the worker count modest and hand the peaks to the Cluster Autoscaler rather than padding every pool. I separate workloads into node pools by shape early, general, memory, GPU, because that is cheap to do up front and annoying to retrofit. And I size against the namespace quota as a hard constraint, not an afterthought, because the quota always wins: an ambitious cluster against a tight quota just produces pending pods. Right-size for the real workload, reserve what matters, and scale the rest. Look at your busiest cluster, are those nodes guaranteed and right-sized, or a generous-looking best-effort fleet that quietly starves when the hosts get busy?
References
- Broadcom TechDocs: ClusterClass Variables for Customizing a Cluster
- Broadcom TechDocs: Create a Custom VM Class for NVIDIA vGPU Devices
- Broadcom TechDocs: Workload Isolation for VKS Clusters on vSphere Zones









