TL;DR · Key Takeaways
- VKS persistent storage runs on the vSphere CSI stack. Inside the cluster a paravirtual CSI (pvCSI) driver forwards volume requests down to CNS in vCenter.
- A Kubernetes StorageClass maps to a vSphere storage policy via
storagePolicyName. The SPBM policy decides which datastores and data services back the volume. - Volumes are CNS-managed First Class Disks, so they are visible and manageable from vCenter as well as Kubernetes. A reclaim policy of Delete removes the VMDK automatically.
- ReadWriteMany needs vSAN File Services (or something like Portworx in-guest). And block volumes do not replicate across zones, you handle cross-zone resilience at the app or database layer.
- For multi-zone clusters use late-binding (
WaitForFirstConsumer) classes with zonal policies so a volume lands where its pod will run.
Stateless apps are easy; the moment you run a database, a queue or anything that must remember things, storage becomes the part that can quietly ruin your day. VKS gets this right by reusing the vSphere storage stack you already trust, but the indirection, Kubernetes StorageClass to vSphere policy to datastore, is worth understanding so you can reason about where your data actually lives and what happens to it when a pod moves.
The CSI chain: pvCSI down to CNS
Inside a VKS cluster, storage requests are handled by a paravirtual CSI driver, pvCSI, the version of the vSphere CNS-CSI driver that lives in the workload cluster. It does not talk to datastores directly; it forwards requests to the CNS-CSI on the Supervisor, which propagates them to Cloud Native Storage (CNS) in vCenter. CNS is what actually creates and tracks the volume as a First Class Disk. The dynamic provisioning and reclaim are clean: set a PersistentVolume reclaim policy of Delete and the underlying VMDK is removed from the datastore automatically when the claim is deleted.
StorageClass to SPBM policy
The bridge between Kubernetes and vSphere storage is the StorageClass. A VKS StorageClass uses the csi.vsphere.vmware.com provisioner and names a vSphere storage policy through storagePolicyName. Storage Policy Based Management (SPBM) ties that policy to real capabilities, which datastores qualify, and what data services (encryption, replication, vSAN options) apply. So when a developer picks a StorageClass, they are really picking an SPBM policy, and the policy decides where and how the volume is provisioned. On VCF 9.1 storage quotas are enforced directly at the vSphere Namespace level, which is the right place for tenant control. As ever, the policies available to a cluster are the ones the administrator assigned to its namespace.
| Layer | What it controls | Owner |
|---|---|---|
| StorageClass | What developers request; binding mode | Cluster / platform team |
| SPBM policy | Eligible datastores + data services | vSphere / storage admin |
| CNS First Class Disk | The actual VMDK, visible in vCenter | Managed by CNS |
RWX, zones and the assumptions that break
From the application’s side this is ordinary Kubernetes: a PVC binds to a PV, the pod mounts it, the reclaim policy decides what happens when the claim goes away. The difference is dual visibility, because the volume is a CNS-managed disk it also shows up in vCenter, where you can see health, capacity and which VM it is attached to. That correlation is genuinely useful when a stateful workload misbehaves.
Two assumptions break designs, so name them now. ReadWriteMany out of the box needs vSAN File Services; with no vSAN or no File Services, you deploy something like Portworx inside the guest clusters to provide RWX over your existing block storage. And on three zones, standard ReadWriteOnce block volumes do not synchronously replicate across zones, the latency penalty is too high, so storage-layer cross-zone HA is not how this works. You handle that availability at the application or database layer, with replication or a service like the vSAN Data Persistence Platform. For multi-zone clusters, also use late-binding storage classes (volumeBindingMode: WaitForFirstConsumer) backed by zonal policies, so a volume is provisioned in the same zone as the pod that will consume it. Skip that and you can strand a volume in a zone where its pod cannot run.
Data services through SPBM: encryption, snapshots, tiering
The real payoff of mapping StorageClasses onto SPBM policies is that everything your storage team already does to virtual machine disks applies to Kubernetes volumes for free. If a policy encrypts at rest, volumes provisioned through it are encrypted. If it replicates, or pins data to a performance tier, or applies a specific vSAN storage policy with a given failure tolerance, your persistent volumes inherit all of it without the developer knowing or caring. That is a genuinely different posture from bolt-on Kubernetes storage, where encryption and replication are separate systems your infrastructure team cannot see. Here, a security auditor can look at the SPBM policy behind a StorageClass and know exactly how that data is protected, in the same console they use for the rest of the estate.
The practical move is to design StorageClasses as named intents that map to clear policies: a fast-replicated class for databases, a cheaper local class for scratch and caches, an encrypted class where compliance demands it. Developers then choose intent (“I need durable, fast storage”) rather than implementation, and you retain control of what that intent actually means at the datastore. Snapshots work the same way, CSI snapshots of First Class Disks give you point-in-time copies you can use for backup or clone, and because CNS tracks them, they are visible and manageable from vCenter rather than being an opaque Kubernetes-only artifact.
Volume expansion, topology and the limits to respect
A few operational realities are worth knowing before they surprise you. Online volume expansion works through the CSI driver, so you can grow a PVC that is filling up, but you cannot shrink one, plan the growth direction accordingly. Topology matters on multi-zone clusters: a volume provisioned in one zone is bound to that zone, which is exactly why the late-binding storage class from earlier is not optional there, it is what stops a volume being created somewhere its pod can never reach. And the access modes are not interchangeable: ReadWriteOnce block volumes attach to a single node at a time, so a workload that assumes many pods can write the same volume needs ReadWriteMany, which means vSAN File Services or an in-guest layer like Portworx, not a default block class.
These are not exotic edge cases; they are the things that turn a working dev setup into a broken production one when the assumptions change. The pod that wrote to a local volume in dev cannot suddenly be scaled to three replicas sharing that volume in production without an RWX rethink. Catch these at design time by being explicit about access mode and zone behaviour for every stateful workload, rather than discovering them when the second replica fails to mount.
Stateful patterns that actually work on VKS
For databases and other stateful services, the pattern that holds up is to let the application own its own redundancy and use VKS storage for fast, policy-backed local persistence underneath each replica. A three-node database running its own replication across three ReadWriteOnce volumes, each on a fast replicated SPBM policy, gives you both application-level and storage-level resilience without pretending a block volume can follow a pod across a zone. That is more resilient than trying to make a single shared volume highly available, and it matches how mature stateful operators expect to run.
This is also where Data Services Manager and the vSAN Data Persistence Platform earn their place for the services they support, because they handle the operational mechanics of stateful services on the platform rather than leaving you to assemble them. Whichever route you take, the principle is the same: design stateful resilience at the application and operator layer, use SPBM-backed volumes for the persistence each instance needs, and never assume the storage layer will silently make a single-instance design highly available. It will not, and finding that out during an incident is the expensive way to learn it.
What I’d Do
I lean into the indirection rather than fighting it, because it is the good kind: SPBM-driven, CNS-tracked volumes mean your storage team’s existing encryption, replication and tiering policies apply to Kubernetes volumes for free, and you can audit them from vCenter, something bolt-on Kubernetes storage cannot offer. I publish a small set of named StorageClasses that map to clear SPBM policies (gold/silver, replicated/local) so developers choose intent, not implementation. I use late-binding everywhere multi-zone is in play, and I design stateful resilience at the database layer from the start rather than hoping the storage will replicate. For your most important stateful workload: is its cross-zone availability handled by the application, or are you quietly assuming the volume will save you?
References
- Broadcom TechDocs: Storage for VKS Clusters (VCF 9.0)
- Broadcom TechDocs: Using Storage Classes for Persistent Volumes
- CormacHogan.com: VCF 9.0 Volume Service, consuming static RWX volumes via VKS









