TL;DR · Key Takeaways
- Self-service is the whole point of Private AI Foundation. If a data scientist still files a ticket to get a GPU VM, you built an expensive lab, not a platform.
- The fast path is the PAIF Quickstart in VCF Automation, which generates the AI Workstation, AI Kubernetes Cluster and Triton catalog items for you. Get the org, project and namespace plumbing right first, or the Quickstart has nothing to publish.
- The catalog item is only a request form. What actually governs cost and blast radius is the VM class (vGPU profile), the project, and the namespace it lands in.
- Field gotcha: the NVIDIA RAG catalog items are removed in PAIF 9.1 because NVIDIA stopped supporting the backing blueprints. If your design depended on the turnkey RAG Workstation, plan for it now.
Every Private AI project I see starts the same way. The platform team builds a beautiful GPU stack, the drivers load, a test inference job runs, and everyone declares victory. Then the first real data scientist shows up and asks for a GPU workstation, and the answer is a Jira ticket with a three-day SLA. That is the moment the platform quietly fails. The infrastructure works, but the consumption model is still 2015.
Part 16 is about closing that gap. The goal is a VCF Automation catalog where a data scientist picks AI Workstation, sets a vGPU size and a framework, clicks request, and gets a running deep learning VM in minutes with no admin in the loop. This is the runbook to get there, plus the design decisions that decide whether your catalog is safe to hand out.
What you are actually publishing
VCF Automation (the product formerly sold as Aria Automation) ships a Quickstart for Private AI Foundation that generates the catalog items for you. You do not hand-author blueprints. The Quickstart produces three core items: an AI Workstation backed by a deep learning VM with PyTorch and TensorFlow pre-installed, an AI Kubernetes Cluster with GPU-capable worker nodes and the NVIDIA GPU Operator already deployed, and a GPU-enabled Triton Inference Server. Two RAG-flavoured items (AI RAG Workstation and AI Kubernetes RAG Cluster, both wired to a pgvector database on Data Services Manager) also existed in 9.0.
Read the matrix below before you promise anyone a RAG button. For the design behind the deep learning VM itself, see Part 10 on Deep Learning VMs, and for what the cluster items feed, Part 12 on the Model Store and Model Runtime.
| Catalog item | Backing workload | GPU | DSM | 9.1 status |
|---|---|---|---|---|
| AI Workstation | Deep learning VM (PyTorch, TensorFlow) | vGPU or none | No | Supported |
| AI Kubernetes Cluster | VKS cluster, GPU Operator pre-installed | vGPU worker nodes | No | Supported |
| Triton Inference Server | DLVM running Triton | vGPU | No | Supported |
| AI RAG Workstation | DLVM with NVIDIA RAG + pgvector | vGPU | Yes | Removed |
| AI Kubernetes RAG Cluster | VKS cluster with pgvector on DSM | vGPU worker nodes | Yes | Removed |
My take on the RAG removal. The 9.1 release notes are blunt: the NVIDIA RAG catalog items are removed because NVIDIA is no longer supporting the blueprints behind them. This is the kind of dependency that bites a platform team six months in, when a turnkey item silently disappears on upgrade. Do not architect a customer demo around the RAG Workstation if you are landing on 9.1. Build the RAG path yourself on top of the plain AI Workstation plus a pgvector instance, which is exactly the assembly we walked through in Part 15. You trade a button for control, and on a platform you support for years, control wins.
The setup flow, end to end
Step by step
- Install Private AI Services on the Supervisor. Private AI Services runs on the vSphere Supervisor for the organization. Without it activated, the namespaces you create have nowhere to host model endpoints. If your Supervisor was deployed without NSX, validate the networking path early, because the supported topologies differ.
- Build the organization and project. In VCF Automation, the project is the governance boundary. It decides which users can request items and which Cloud Zones or Kubernetes Zones their requests can land in. Add the vCenter cloud account, define the zones that map to your GPU workload domain, and set capacity limits here, not later.
- Import the Private AI namespaces and activate Private AI Services. Import the Supervisor namespaces into VCF Automation, then have the org administrator activate Private AI Services in each namespace that should host endpoints. Bind the vGPU-backed VM class and the namespace quota at this point.
- Run the PAIF Quickstart. The Quickstart creates the content source and generates the catalog items. This is also where it wires the items to your project and zones, so the earlier steps have to be right or the Quickstart produces items that fail at request time.
- Entitle and publish. Catalog items are invisible until you entitle them to the project. Share only the items each audience needs. Data scientists rarely need the Kubernetes cluster item, and DevOps teams rarely need the single-VM workstation.
If you want the underlying VCF Automation constructs (provider, organizations, projects, Service Broker) explained from the ground up, that is covered in VCF Automation in VCF 9 Explained. Here we assume you know those pieces and are wiring them to GPUs.
# Confirm the vGPU VM class the catalog item will request exists in the namespace
kubectl get virtualmachineclass -n <org-namespace>
# Confirm Private AI Services is activated for the namespace
kubectl get pods -n <org-namespace> | grep -i private-ai
# After a request deploys a DLVM, confirm the vGPU actually attached
kubectl get virtualmachine -n <org-namespace> -o wide
nvidia-smi # run inside the deployed deep learning VM
The catalog item is just a request form
The single most common design mistake is treating the catalog item as the thing that matters. It is not. The item is a form. What it resolves to (the VM class, the framework image, the network, the namespace) is where all the governance and cost live. A user choosing a 48 GB vGPU profile instead of a shared 8 GB profile is the difference between four workstations and twenty four on the same host.
Two design rules I apply on every engagement. First, publish a small number of vGPU profiles, not a free-text field. A curated list of, say, three sizes (shared, single-GPU, multi-GPU) keeps users out of the fractional-GPU weeds and keeps your host density predictable. Second, set the namespace quota to a number you can actually back with hardware. Self-service without a quota is just a faster way to exhaust your GPUs, and the failure mode is ugly: requests succeed in VCF Automation and then sit pending because there is no capacity to place them.
Validate before you hand it out
Do not entitle a catalog item to real users until you have run the full path yourself and watched it land. The failure points are predictable, and they almost always trace back to a missing binding rather than a broken Quickstart.
The pattern is consistent: when a self-service request fails, the catalog is almost never the culprit. It is a project that was not entitled, a vGPU VM class that was never bound to the namespace, or a zone with no real GPU capacity behind it. Build the bindings deliberately, then let the catalog be boring. Boring is the goal. A boring catalog is one your users trust.
What I’d Do
Stand up the three core items (AI Workstation, AI Kubernetes Cluster, Triton) with the Quickstart, publish a curated set of vGPU sizes, set hard namespace quotas, and entitle each item to only the audience that needs it. Skip the RAG catalog items entirely on 9.1 and assemble RAG yourself, because a turnkey item you cannot support is worse than no item at all. Then go validate the full request-to-running path before a single real user sees the catalog. Self-service that fails on first use does more damage to platform credibility than no self-service at all.
Where has your self-service catalog leaked the most: quota exhaustion, vGPU sizing, or entitlement sprawl? That is usually where the next round of governance work lives.
References
- Set Up Your VCF Automation Organization for VMware Private AI Foundation with NVIDIA (Broadcom TechDocs)
- Install VMware Private AI Foundation with NVIDIA using VCF Automation (VCF Blog)
- VMware Private AI Foundation with NVIDIA 9.1 documentation and release notes (Broadcom)
« Previous: Part 15 | VMware Private AI Complete Guide | Next: Part 17 »



