Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VKS Cluster Sizing: VM Classes, Node Pools and Control-Plane Topology (VKS Series, Part 5)

Node size in VKS comes from a VM class, not a free-form number. Here is how VM classes, node pools and the one-or-three control plane decision actually shape a cluster.

VKS Cluster Sizing: VM Classes & Node Pools
VKS Series · Part 5 of 17

TL;DR · Key Takeaways

  • Node size in VKS comes from a VM class, not a free-form number. The class fixes the CPU and memory of every node built from it.
  • guaranteed classes reserve their resources; best-effort classes do not. Use guaranteed for production nodes you cannot afford to have starved under contention.
  • Workers live in node pools (machine deployments). Different pools can use different VM classes, which is how one cluster mixes general, memory-heavy and GPU nodes.
  • Control plane is one node for throwaway clusters or three for HA. Three is the right default for anything you care about.
  • Size for the workload’s real shape and the namespace quota, then let the autoscaler absorb peaks. Oversized best-effort fleets waste quota and still get starved.
Who this is for: whoever decides how big clusters should be, and whoever owns the namespace quota they draw from.  Prerequisites: Parts 2 and 4, plus a rough idea of your workloads’ CPU and memory profile.

The manifest in Part 4 quietly assumed you already knew how big to make things. You do not get to type arbitrary CPU and memory for a VKS node; you pick a VM class, and the class sets the shape. That constraint is a feature, it keeps clusters consistent and quotas enforceable, but it means sizing is a decision you make deliberately, up front, not something you nudge later without rolling nodes. This part gives you a way to reason about it that holds up in production.

VM classes: where node size is actually decided

A VM class is a named compute profile, so much CPU, so much memory, with a reservation policy. Every node built from that class gets exactly that shape. VKS ships default classes across the best-effort and guaranteed families, from small through extra-large, and a vSphere administrator can create custom classes, including the vGPU classes we cover in Part 14. The distinction that matters under load: guaranteed classes reserve their CPU and memory, so the node always has what it was promised; best-effort classes do not, so they can be starved when the hosts are busy. The classes available to a cluster are the ones the administrator has added to its vSphere Namespace, the two-tier model from Part 2 again, so if the class you want is not there, that is a namespace task, not a manifest tweak.

VM class familyReservationUse for
guaranteed-*Reserves full CPU + memoryProduction workers; anything latency-sensitive
best-effort-*No reservation; shares spare capacityDev/test, bursty or tolerant workloads
custom vGPU classAdds an NVIDIA vGPU / passthrough profileGPU and AI worker pools (Part 14)

Node pools and the one-or-three control plane

Workers live in node pools, expressed as machine deployments in the manifest. Each pool has its own VM class and replica count, so a single cluster can hold a general-purpose pool on a balanced class, a memory-heavy pool for caches or databases, and a GPU pool for inference, all governed together. Node pools are also the unit the autoscaler operates on (Part 9), so designing them well early pays off: adding a differently shaped pool later is easy, but reshaping a pool in place means rolling its nodes.

The control plane is the other axis. A single control plane node is fine for ephemeral dev and test where you can tolerate losing the cluster if its one node fails. For anything with a real workload, run three so the cluster survives a node failure and an upgrade without an API outage. The cost is three control-plane VMs your namespace quota must accommodate. Treat three as the default and one as the deliberate exception.

One cluster, mixed node pools Each pool draws its shape from a different VM class Control plane × 3 (HA)survives a node failure and rolling upgrades Pool: generalguaranteed-large × 3-8microservices, web tiers Pool: memoryhigh-memory class × 2-4caches, in-memory data Pool: gpucustom vGPU class × 1-2inference (Part 14)
Node pools let one governed cluster carry general, memory and GPU shapes side by side.

A sizing reference to start from

Cluster shapeControl planeWorker poolUse for
Dev / sandbox1 node2 × best-effort-mediumExperiments, CI, throwaway clusters
Team prod (small)3 nodes3+ × guaranteed-largeSteady microservice workloads
Memory-heavy3 nodespool on a high-memory classCaches, in-memory data, JVM-heavy apps
GPU / AI3 nodespool on a custom vGPU classInference and training (Part 14)

Treat this as a starting point, not gospel. The exact class names and amounts depend on what your administrator published into the namespace, and the right replica counts depend on your actual workload. The principle that holds everywhere: size for the real CPU and memory profile and for the quota you have, then let the autoscaler absorb peaks rather than over-provisioning every pool for the worst case.

The oversizing trap: a fleet of large best-effort nodes looks generous on paper, but it burns namespace quota and still gets starved under contention because nothing is reserved. Fewer, right-sized guaranteed nodes plus an autoscaler beats a pile of hopeful best-effort ones almost every time.

Reserved versus best-effort under real contention

The guaranteed-versus-best-effort choice sounds academic until the hosts get busy, and then it decides whether your workload runs or stalls. A guaranteed VM class reserves its CPU and memory in vSphere, so the node always has the resources it was promised regardless of what else is competing for the host. A best-effort class reserves nothing; it consumes spare capacity, and when there is no spare it is the first to be squeezed. On a quiet cluster the two behave identically, which is exactly why the difference is invisible in testing and painful in production. The first time a noisy neighbour saturates a host, your best-effort database node is the one that starts missing its scheduling deadlines.

My rule of thumb: production worker pools and anything stateful or latency-sensitive get guaranteed classes; dev, test, CI and genuinely elastic, tolerant workloads can use best-effort to pack more density onto the same hardware. The cost of guaranteed is that reserved capacity cannot be overcommitted, so you fit fewer nodes per host. That is the right trade for workloads that must not be starved, and the wrong one for a fleet of throwaway clusters where density matters more than guarantees. Decide per pool, not per cluster, because a single cluster can legitimately mix a guaranteed production pool with a best-effort batch pool.

Sizing the control plane, not just the workers

Worker sizing gets all the attention, but an undersized control plane is a slower, sneakier problem. The control plane runs the API server, the controllers and etcd, and etcd in particular is sensitive to two things: memory pressure and disk latency. A control plane node starved of memory will see the API server get sluggish; an etcd backed by slow storage will see write latency climb until leader elections start flapping under load. On a cluster with a lot of objects, many namespaces, large configmaps, frequent deployments, controllers hammering the API, the control plane works harder than the idle demo suggests.

So do not reflexively give the control plane the smallest class. Match it to the cluster’s object count and churn, give etcd fast storage through the storage policy you assign, and on anything production run three control plane nodes so a single failure or a rolling upgrade never costs you the API. The symptom of getting this wrong is subtle: not an outage, but a cluster that feels laggy under deployment storms and whose kubectl calls intermittently time out. By the time you notice, it is a live cluster and resizing the control plane means rolling it.

How node pools land on the underlying hosts

Worker nodes are VMs, so they are placed by vSphere DRS across the hosts in the zone, and that has two consequences worth designing for. First, DRS spreads and balances the node VMs, which is good, but Kubernetes does not automatically know that two of its nodes might be on the same physical host. If you need real fault isolation, two replicas of a critical pod genuinely landing on different hosts, you lean on Kubernetes anti-affinity at the pod level and, where it matters, on host-level placement so the node VMs themselves are spread. Second, a node pool drawn from one VM class will produce identically shaped VMs, which keeps DRS placement and capacity planning predictable.

The design point is that VKS gives you two layers of scheduling, Kubernetes scheduling pods onto nodes, and vSphere placing node VMs onto hosts, and they do not coordinate by default. For most workloads that is fine. For the ones where a single host failure must not take out both replicas of something, you have to say so explicitly at the Kubernetes layer with anti-affinity and topology spread constraints, and verify the node VMs are not quietly stacked on one host. This is exactly the kind of thing that looks fine in steady state and bites during a host failure, so design for it before you need it.

What I’d Do

For production, I default to a three-node control plane and guaranteed worker classes, then keep the worker count modest and hand the peaks to the Cluster Autoscaler rather than padding every pool. I separate workloads into node pools by shape early, general, memory, GPU, because that is cheap to do up front and annoying to retrofit. And I size against the namespace quota as a hard constraint, not an afterthought, because the quota always wins: an ambitious cluster against a tight quota just produces pending pods. Right-size for the real workload, reserve what matters, and scale the rest. Look at your busiest cluster, are those nodes guaranteed and right-sized, or a generous-looking best-effort fleet that quietly starves when the hosts get busy?

References

VKS Series · Part 5 of 17
« Prev: Part 4  |  VKS Complete Guide  |  Next: Part 6 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading