Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, , ,

Kubernetes and VKS Self-Service in VCF Automation: The All Apps Organization (VCF Automation 9 Series, Part 26)

How the All Apps organization in VCF Automation turns VKS clusters and VMs into self-service catalog items, with a real Cluster blueprint, the namespace hierarchy, and the guardrails that keep tenants honest.

VCF Automation 9 Series · Part 26 of 41
TL;DR · Key Takeaways
  • The All Apps organization is the Kubernetes-API-based model in VCF Automation. It is not a tab bolted onto the VM Apps world; the whole consumption plane runs on a vSphere Supervisor.
  • The hierarchy is fixed: organization maps to an NSX project, a region groups one or more Supervisors, a project carries vSphere Namespaces, and namespaces are where VMs and VKS clusters actually land.
  • A VKS cluster is provisioned from a VMware Cloud Template that wraps a Cluster API Cluster manifest in a CCI.Supervisor.Resource. The same object can be applied with kubectl.
  • All Apps requires an external Orchestrator, a Supervisor, VKS, and NSX VPCs. Skip any one and the org will not function. Plan that before you promise a tenant a catalog.
  • Guardrails live in region quotas, namespace classes and VM classes. Get those wrong and a single tenant can starve a region.

Who this is for: Cloud and platform admins, automation engineers and architects who run VCF Automation and now have to deliver Kubernetes as a self-service product, not a ticket queue.

Prerequisites: A working VCF Automation instance on VCF 9.0.1 or 9.1, a vSphere Supervisor with VKS enabled, NSX with VPCs, and an external VCF Operations Orchestrator. Familiarity with VMware Cloud Templates and the provider / tenant model from earlier Parts.

A platform lead told me his team had ‘enabled Kubernetes self-service’ in VCF Automation by adding a blueprint with a Kubernetes cluster in it. Then nothing deployed, because the org was a VM Apps organization and there was no Supervisor underneath it. That is the single most common misunderstanding I see. Kubernetes self-service in VCF Automation does not live in VM Apps at all. It lives in the All Apps organization, and All Apps is a different architecture that consumes a vSphere Supervisor through the Kubernetes API. If you have read Part 3 on VM Apps vs All Apps, this Part is where that choice cashes out into real Kubernetes and VKS provisioning.

What All Apps actually changes

VM Apps is the VM-centric model, close to Aria Automation 8.x. You author a VMware Cloud Template, the provisioning engine talks to a cloud account, and a VM lands on a cloud zone. All Apps is application-centric and Kubernetes-native. A VM, a container, a network and a disk are all just consumable resources expressed as Kubernetes objects against a Supervisor. That is why VKS and NSX with VPCs are mandatory components of the automation solution in VCF 9, not optional add-ons. The consumption plane is the Supervisor; VCF Automation is the multi-tenant front door and policy layer on top of it.

The practical consequence: there is no quiet migration from a VM Apps org to an All Apps org. Different backing architecture, different resource model, different APIs. You stand All Apps up deliberately, and you decide which tenants belong there. The reward is real multi-tenancy and a single catalog that can hand out VMs, VKS clusters and data services through the same request flow.

VM Apps vs All AppsTwo organization models, two backing architecturesVM AppsVMware Cloud Template → cloud accountVM lands on a cloud zoneEmbedded or external OrchestratorVM-centric, close to Aria 8.xAll AppsTemplate → Kubernetes object on SupervisorVM or VKS cluster lands in a namespaceExternal Orchestrator requiredApp-centric, Kubernetes-native
The two org models do not share a provisioning path. All Apps consumes a Supervisor through the Kubernetes API.

The organization is an NSX project

An All Apps organization is implemented as an NSX project, which is what gives you isolation at the network level rather than a soft logical boundary. Resources, cost control and policies apply across the whole organization. That detail matters when you design tenancy: the network blast radius and the policy scope are the organization, so you size and name organizations like you would size NSX projects, not like you would name a folder. Network isolation between tenants is structural, not a setting you can forget to turn on.

The consumption hierarchy you have to internalize

Almost every All Apps mistake I troubleshoot comes back to someone not holding the hierarchy in their head. There are four levels and each maps to something concrete in the Supervisor and NSX. Get the mapping wrong and you will spend an afternoon wondering why a deployment has nowhere to go.

All Apps consumption hierarchyOrganization → Region → Project → vSphere Namespace1Organization= NSX project, isolation boundary2Regionone or more Supervisors + quota3Projectteam, members, roles, limits4vSphere Namespacewhere VMs and VKS clusters landOn the SupervisorNamespace = vSphere Namespacesized by a namespace classVKS cluster = Cluster API objectVM = VM Service object via LCINetwork = NSX VPC subnet set
Four levels, each backed by a real construct. A deployment with no namespace has nowhere to land.
LevelWhat it isBacked by
OrganizationTenant boundary for resources, cost and policyNSX project
RegionPool of compute, memory, storage, networkOne or more vSphere Supervisors
ProjectTeam, members, roles, catalog access, limitsLogical grouping inside the org
NamespaceWhere workloads actually deployvSphere Namespace (sized by namespace class)

In practice: A region can span multiple vCenter instances and a Supervisor can be reused across more than one organization, but quota is enforced at the region. The first thing I check on a ‘tenant cannot deploy’ ticket is whether the project has a namespace at all, then whether the region quota still has headroom. Nine times out of ten it is one of those two.

Prerequisites you cannot skip

All Apps has hard dependencies, and the platform will not warn you politely if one is missing. You need a Supervisor with VKS enabled, NSX configured with VPCs (a T0 in active/standby is still required where stateful VPC services such as auto-SNAT are in play in 9.0.1), an external VCF Operations Orchestrator (the embedded Orchestrator is not an option for All Apps), and the Local Consumption Interface available on the Supervisor. In VCF 9.1 the Local Consumption Interface became a core Supervisor service that deploys by default, which removes one manual step that used to trip people up in 9.0.x. VCF Automation uses that interface to deploy VMs into the namespace, so if VM deployments succeed for clusters but fail for plain VMs, the consumption interface is the first place I look.

If you have not stood up the Supervisor and VKS runtime yet, the mechanics are covered in Enabling vSphere Supervisor and the VKS Runtime in VCF 9, and the cluster provisioning workflow itself in Provisioning VKS Clusters with ClusterClass and Cluster API. This Part assumes that layer exists and focuses on wrapping it in a tenant-facing catalog.

Two ways to consume: catalog or kubectl

Because every resource in All Apps is a Kubernetes object against the Supervisor, a tenant has two genuine paths to the same outcome. They can request a published catalog item through Automation Service Broker, which is the governed, form-driven path most admins want their users on. Or, because the namespace is a real Kubernetes context, a developer with the right role can apply the same object with kubectl. Both create identical resources. The difference is governance, not capability.

Request to deploymentFrom catalog item to a running VKS cluster1Service Brokertenant requests item2Blueprintresolves inputs + policy3Supervisorapplies K8s objects4VKS clusterrunning in namespacekubectl path: a developer applies the same object directly to the namespace, skipping steps 1 and 2.
The catalog path and the kubectl path converge on the same Supervisor objects.

The Service Broker catalog path

This is the path you publish to most tenants. An administrator authors a VMware Cloud Template, versions it, and releases it as a catalog item in Automation Service Broker. The tenant fills in a form, governance and lease policy apply, and the deployment lands in the project namespace. The Private AI series shows the same pattern for a GPU-accelerated cluster: the AI Kubernetes Cluster catalog item is just a blueprint published this way, which I covered from the consumer angle in Self-Service AI Catalog Items with VCF Automation.

The Kubernetes-native path

Because the namespace is a real Kubernetes context, a developer can target it with kubectl and apply the same Cluster object the blueprint would have created. This is useful for GitOps pipelines and for developers who never want to see a portal. The trade-off is that you lose the request form, the approval gate and the lease unless you wire those in another way. My recommendation: publish the catalog item for everyone, and open the kubectl path only to projects that have a pipeline and a reason. Do not leave both wide open by default, or your governance lives in two places and agrees in neither.

A real VKS cluster blueprint

Here is the part most posts skip. A VKS cluster blueprint in All Apps is a VMware Cloud Template with two resources: a CCI.Supervisor.Namespace that points at the target vSphere Namespace, and a CCI.Supervisor.Resource whose manifest is a Cluster API Cluster object. The manifest is standard upstream Cluster API with VMware-specific class references, so if you have provisioned a VKS cluster declaratively before, this will look familiar.

formatVersion: 1
inputs:
  clusterName:
    type: string
    title: Cluster name
    description: DNS-compliant, lowercase, RFC-conformant
    pattern: ^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$
resources:
  CCI_Supervisor_Namespace_1:
    type: CCI.Supervisor.Namespace
    properties:
      name: a-team-space
      existing: true
  Kubernetes_Cluster_1:
    type: CCI.Supervisor.Resource
    properties:
      context: ${resource.CCI_Supervisor_Namespace_1.id}
      manifest:
        apiVersion: cluster.x-k8s.io/v1beta1
        kind: Cluster
        metadata:
          name: ${input.clusterName}-${env.shortDeploymentId}
        spec:
          clusterNetwork:
            pods:
              cidrBlocks:
                - 192.168.156.0/20
            services:
              cidrBlocks:
                - 10.96.0.0/12
            serviceDomain: cluster.local
          topology:
            class: builtin-generic-v3.4.0
            classNamespace: vmware-system-vks-public
            version: v1.33.3---vmware.1-fips-vkr.1
            variables:
              - name: vmClass
                value: best-effort-small
              - name: storageClass
                value: vks-storage-policy
            controlPlane:
              replicas: 1
            workers:
              machineDeployments:
                - class: node-pool
                  name: np-1
                  replicas: 2

What to change for your environment: the namespace name must match a vSphere Namespace that exists under the project (find it under Build & Deploy → Services → Overview, or under Manage & Govern → Namespaces). The vmClass must be one that is allowed in the region quota; best-effort-small is a standard class. The storageClass must match a storage class published to the namespace. The version string must be a VKS Kubernetes release that the Supervisor actually serves; do not invent it, list the available releases first.

Expected result: the Test button validates the YAML formatting, Deploy runs a live provisioning test into the namespace, and Version publishes it to the catalog. Once deployed, the cluster moves from Provisioning to Running and appears as a workload in the namespace, in vCenter and in the consumption interface. The same object, applied with kubectl, produces the identical resource:

# target the project namespace context, then apply the Cluster object
kubectl config use-context a-team-space
kubectl apply -f vks-cluster.yaml
kubectl get clusters -n a-team-space

# NAME             PHASE          AGE
# analytics-7x2k   Provisioning   40s
# ... a few minutes later ...
# analytics-7x2k   Provisioned    6m

The common failure mode: a deployment that sits in error with no obvious message. The usual cause is a version, vmClass or storageClass value that the Supervisor does not recognize, or a namespace name that does not exist. The platform does not always pre-check these, so verify each value against the live Supervisor before you publish. The second failure mode is a cluster that provisions but whose nodes never become Ready, which is almost always undersized Supervisor control plane or an NSX VPC networking gap rather than anything in the blueprint.

Worked example

A project gets a namespace class of small best-effort: a ceiling of roughly 10 GHz CPU and 10 GB RAM with no reservations. A two-worker cluster on best-effort-small nodes plus a control plane will exceed that fast. If you intend tenants to run real VKS clusters, set the namespace class to medium or larger and size the region quota for the number of clusters you promised, not the number you demoed. Three teams each running one three-node cluster on small nodes is a different quota than one team running a demo.

Namespace classes, quota and guardrails

Guardrails in All Apps are not a single policy screen. They are layered. The region quota caps what an organization can draw and which VM classes are permitted. The namespace class caps an individual namespace, acting as the T-shirt size for a project workspace. VM classes are the per-VM sizing the Supervisor exposes, and they double as the node sizes for VKS clusters because cluster nodes are VMs. Approval, lease and day-2 policy from Part 16 on governance policies sit on top of the catalog request, but they do not replace quota; a request that passes approval still fails if the namespace has no room.

GuardrailScopeWhat it controls
Region quotaOrganization within a regionTotal CPU/RAM/storage and permitted VM classes
Namespace classSingle namespaceCeiling for one project workspace
VM classPer VM / per cluster nodevCPU and memory of each VM or node
Approval / lease policyCatalog requestWho can request, for how long, day-2 rights
Gotcha

Newly created custom VM classes do not appear in the region quota immediately. There is a sync delay. If a tenant reports that a VM class you just defined is not selectable, give the inventory sync time before you start debugging the blueprint. The same applies to a freshly added Supervisor; an inventory sync in the provider often clears an All Apps start screen that looks empty.

What VCF 9.1 adds for All Apps

Almost every meaningful improvement in VCF 9.1 lands on the All Apps side, which tells you where Broadcom is investing. A few are worth knowing before you design a catalog. Blueprinting matured: the consumption view now groups resources into domain-aware categories such as VPC and Workload instead of one flat list, which scales better once a tenant has dozens of objects. Container Service arrived, a simplified container runtime delivered through VCF Automation that runs directly on ESX without a full cluster, with the UI able to generate consistent YAML for a later move to a VKS cluster. App Stack Formation lets you capture a running multi-VM topology, including network and disks, into a single reusable blueprint. Infrastructure Policies map workload placement to vCenter compute policies and attach to region quotas and namespaces. VKS 3.6 widened node OS options (RHEL 9 alongside Photon, Ubuntu and Windows Server 2022), added flexible CNI selection including an official Cilium add-on, and introduced multi-cluster Supervisor zones for non-disruptive hardware maintenance.

The one I would flag to a client planning a migration is non-disruptive VM import into a Supervisor namespace. It brings existing vSphere VMs under VCF Automation control without a re-IP or downtime, and it is widely read as the groundwork for a future VM Apps to All Apps migration tool. Useful caveat from the field: an imported VM is manageable by VCF Automation only when the namespace it is imported into was itself created by VCF Automation. That ownership rule is easy to miss until brownfield onboarding forces it, when a team finds an imported VM visible in the catalog but unmanageable because its namespace was created outside VCF Automation. Settling the organization and namespace ownership up front is what avoids that dead end.

Catalog item or kubectl?A decision tree for All Apps consumptionNeed a governedrequest + lease?Yes → Service BrokerPublish a blueprint as acatalog item. Form, approval,lease and day-2 all apply.No → kubectl pathApply the Cluster object tothe namespace. Best forGitOps pipelines with a reason.Both paths create the same Supervisor objects. Default everyone to the catalog.
Capability is identical on both paths; choose by how much governance the workload needs.
Disclaimer

Provisioning clusters, changing region quotas and editing namespace classes are production changes that affect live tenants. Validate version, vmClass and storageClass values against the running Supervisor, test in a non-production organization first, and confirm region quota headroom before you publish a catalog item that tenants can fan out.

The Bottom Line

If you want real Kubernetes and VKS self-service in VCF Automation, the answer is the All Apps organization, and there is no shortcut around its prerequisites. Stand up the Supervisor, VKS, NSX VPCs and an external Orchestrator first, internalize the organization-region-project-namespace hierarchy, and treat the region quota and namespace class as your real governance, not an afterthought. Publish VKS clusters as Service Broker catalog items so tenants get a form and a lease, and open the kubectl path only where a pipeline justifies it. My recommendation is to start a tenant on All Apps only when they will actually consume containers or clusters; if a team only ever needs VMs and you do not need true network-isolated multi-tenancy, a VM Apps org is simpler and closer to what they already know. Where I would pick differently: anyone betting on the VCF platform direction, or needing structural tenant isolation, should commit to All Apps now and absorb the learning curve, because that is where the product is going. If you take one action from this Part, list the Kubernetes releases your Supervisor actually serves and pin your blueprint to one of them before you publish it.

VCF Automation 9 Series · Part 26 of 41
« Previous: Part 25  |  VCF Automation Guide  |  Next: Part 27 »

References

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF Automation 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading