Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

Securing VKS Clusters: RBAC, Pod Security and Namespace Isolation (VKS Series, Part 10)

VKS security is two layers people constantly blur. Here is who runs a cluster versus who uses it, and the four controls that real tenant isolation actually needs.

Securing VKS: RBAC, Pod Security & Isolation
VKS Series · Part 10 of 17

TL;DR · Key Takeaways

  • Access control is two layers: vSphere Namespace permissions govern who can manage clusters, Kubernetes RBAC governs what users do inside them.
  • Authenticate with kubectl using the kubeconfig, or with the VCF CLI using a revocable token. Token auth is the stronger default for shared clusters; a leaked static kubeconfig is forever.
  • Pod Security Admission replaces PodSecurityPolicy. Default namespaces to restricted or baseline and make privileged an explicit, scoped exception.
  • Real isolation is RBAC + PSA + network policy + registry control. Any three of the four leaves a hole, most commonly missing network policy on a shared cluster.
Who this is for: platform owners and security teams hardening a shared VKS platform.  Prerequisites: Part 2’s two-tier model (namespace permissions vs cluster RBAC).

Security on VKS is where the vSphere world and the Kubernetes world meet, and the seam is exactly where people get confused. Who can create a cluster is a vSphere question. Who can deploy into it is a Kubernetes question. Mix those up and you either lock developers out or hand them far too much. This part draws the line clearly and then covers the controls that actually contain a workload, because RBAC alone does not.

Two layers of access control

The first layer is the vSphere Namespace. The permissions an administrator sets there decide which DevOps users can manage the lifecycle of VKS clusters in that namespace, create, scale, upgrade, delete. The second layer is Kubernetes RBAC inside each workload cluster, which decides what an authenticated user can do once in: which namespaces, which verbs, which resources. They work together. The namespace permission gets you the right to provision and own a cluster; cluster RBAC is how you grant your developers scoped access to it. Say it as a slogan and it sticks: namespace permissions decide who runs the cluster; RBAC decides who uses it.

Who runs the cluster vs who uses it vSphere adminsets namespace permissions:quota, VM classes, who maymanage clusters Platform owner (DevOps)provisions clusters, thengrants Kubernetes RBACto developers Developersdeploy workloads withinthe RBAC they were given Permissions flow left to right; each layer hands a scoped slice to the next.
The two-tier model: the namespace grants the right to run clusters; RBAC grants the right to use them.

Authentication and Pod Security Admission

Once a cluster exists you download its kubeconfig and connect with kubectl, which remains fully supported in VCF 9, and there is a graphical Local Consumption Interface (LCI) for browsing and operating clusters. For shared clusters the stronger pattern is token-based access through the VCF CLI: you authenticate with a valid token, the token can be revoked, and periodic re-authentication is enforced. That revocability is the security advantage, a leaked static kubeconfig is forever; a token is not. Where VCF Automation deploys a shared cluster, a JWT authenticator can be registered so VCF Automation identities are used directly.

The old PodSecurityPolicy is gone from upstream Kubernetes, and VKS uses Pod Security Admission (PSA) with the pod security standards instead. PSA enforces a baseline at the namespace level, restricted, baseline or privileged, so a workload asking for host access, privileged containers or dangerous capabilities is rejected unless its namespace is explicitly allowed to run it. Default namespaces to restricted or baseline and grant privileged only where a specific workload genuinely needs it, with a ClusterRoleBinding that scopes that privilege deliberately rather than enabling it everywhere.


Real isolation takes four controls, not one

RBAC controls actions, not blast radius. Genuine tenant isolation is the sum of four controls, and leaving any one out is where breaches hide:

ControlStopsIf you skip it
RBACUnauthorized actions on objectsUsers do things they shouldn’t
Pod Security AdmissionPrivileged / host-level podsA pod escapes its boundary
Network policy (Antrea)Cross-namespace pod trafficPods talk freely despite tidy RBAC
Registry controlArbitrary / untrusted imagesSupply-chain risk walks in
The shared-cluster trap: the most common gap I see is a shared cluster with tidy RBAC and no network policy. Users cannot see each other’s objects, so it feels isolated, but the pods can still talk freely across namespaces. RBAC plus PSA plus network policy plus registry control is the real boundary.

How authentication actually flows

It helps to trace what happens when a user runs a command against a cluster, because the answer determines how you revoke access when someone leaves. With a plain downloaded kubeconfig, the cluster trusts a credential embedded in that file, and that credential is effectively permanent until the cluster is rebuilt or the certificate rotated. With the token-based path through the VCF CLI, the user authenticates against an identity source, receives a short-lived token, and presents that token to the cluster, which validates it. The difference is everything for offboarding: revoke the identity or the token and access is gone immediately, whereas a leaked static kubeconfig keeps working until you notice and rebuild. Where VCF Automation deploys a shared cluster, a JWT authenticator can be registered so the cluster trusts VCF Automation identities directly, which is the cleanest model for shared platforms because identity lives in one place.

The design rule that follows is simple: static kubeconfigs are for automation accounts and break-glass, not for humans. Humans authenticate through the token path so their access is tied to a revocable identity, and you keep a short, audited list of who holds a static credential and why. The first time you have to offboard someone in a hurry, you will be glad their access was a token and not a file three people copied to their laptops.

Network policy patterns that hold up

Network policy is the control most often missing, so it is worth being concrete about what good looks like. The pattern that holds up under audit is default-deny per namespace: drop all ingress and egress, then explicitly allow only the flows each workload needs. A frontend may talk to its backend, the backend to its database, and nothing else moves. With Antrea you also get cluster-wide policies and tiering, so a platform team can set guardrails that individual namespace owners cannot override, which is what lets you delegate namespaces to teams without surrendering the security boundary. The mistake to avoid is the allow-all-then-tighten-later approach, because later never comes and you ship a flat cluster where any compromised pod can reach everything.

Start strict. A default-deny baseline that you open deliberately is a posture you can defend; an open cluster you intend to lock down someday is a finding waiting to happen. Because Antrea integrates with NSX, those policies are also visible to the network team in the tooling they already use, so segmentation is not a Kubernetes-only secret that the rest of security cannot see.

The supply chain: registries, scanning, signing

Most Kubernetes compromises do not break the platform; they walk in through a container image. Controlling the supply chain means three things on VKS. First, constrain which registries clusters may pull from, so a workload cannot quietly pull latest from an arbitrary public registry; point everything at a private registry you curate. Second, scan those images for known vulnerabilities before they are allowed to run, and make a failing scan block the deployment rather than file a ticket nobody reads. Third, verify image provenance, signed images and an admission policy that rejects unsigned ones, so what runs is what you built and approved, not something substituted along the way.

These controls are enforced with admission policy in the cluster, which is the same mechanism behind Pod Security Admission, so it is a layer you are already running. The point is that RBAC, Pod Security and network policy protect the cluster from its users, while registry control, scanning and signing protect it from its workloads. A serious platform needs both halves; teams that harden access and ignore the supply chain have locked the front door and left the loading dock open.

What I’d Do

I keep the two tiers crisp: namespace permissions for who can run clusters, RBAC for who can use them, and I never blur them. For shared clusters I prefer revocable token auth over handing out static kubeconfigs, because revocation is the difference between rotating a credential and rebuilding a cluster. I default every namespace to a restricted or baseline pod security standard and treat privileged as a scoped exception with a name attached. And I never call a tenant “isolated” until all four controls are in place, with default-deny network policy doing the work RBAC cannot. Decide between separate clusters per tenant (strongest, more overhead) and shared clusters with disciplined namespace isolation based on how much you trust your tenants. On your busiest shared cluster right now: is there a default-deny network policy, or just RBAC that hides objects while the pods chat freely?

References

VKS Series · Part 10 of 17
« Prev: Part 9  |  VKS Complete Guide  |  Next: Part 11 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading