- The Private AI Foundation Quickstart in VCF Automation generates ready-made GPU catalog items: AI Workstation, RAG Workstation, Triton Inference Server, AI Kubernetes Cluster and AI Kubernetes RAG Cluster.
- The wizard is the easy 20 minutes. The blueprints it generates are a starting point you must edit before production, not a finished catalog.
- Prerequisites gate everything: vGPU-enabled VM classes, Private AI content-library images, an NVIDIA license, and a namespace with GPU capacity. Miss one and the catalog item fails at request time.
- A GPU catalog without a lease policy and a profile allowlist is a budget incident waiting to happen. Idle deep learning VMs squat on whole vGPU profiles for weeks.
- Default to fractional vGPU, gate the large profiles behind approval, and cap GPUs per project. The catalog is where you ration the most expensive thing you own.
Who this is for: Cloud and platform admins standing up GPU self-service in VCF Automation, and architects who have to make a fixed GPU budget serve many data science and DevOps teams.
Prerequisites: VCF Automation 9.x with Private AI Foundation with NVIDIA in place, a GPU workload domain with vGPU-enabled VM classes and Private AI images, an NVIDIA license, and the catalog and policy knowledge from the earlier Parts.
A GPU sitting idle in a forgotten virtual machine is the most expensive thing in your data center. That single sentence is the whole reason this Part exists. Self-service for AI is easy to switch on and easy to regret, because the same catalog that lets a data scientist get a GPU workstation in ten minutes also lets twenty of them hold full vGPU profiles for a month each. The platform makes provisioning trivial; you have to make reclamation and rationing equally deliberate.
This Part is the VCF Automation admin’s view of Private AI self-service: what the Quickstart builds, what you must change before you trust it, and how to govern scarce GPUs through the catalog. It is the platform side of the consumer-facing material in the Private AI Series catalog post, which is worth reading alongside this. I am writing against the current VCF 9.1 release.
What the Quickstart actually builds
The Private AI Foundation Quickstart is a wizard inside VCF Automation. You point it at a namespace with vGPU-enabled VM classes and Private AI images, answer a few questions, and it generates the cloud templates and publishes the catalog items for you. The first run creates a set of items aimed at two audiences: data scientists who want a GPU workstation, and DevOps engineers who want a GPU Kubernetes cluster.
| Catalog item | What it provisions | Who requests it |
|---|---|---|
| AI Workstation | Deep learning VM, configurable vCPU/vGPU/memory, PyTorch/CUDA/TensorFlow | Data scientist |
| AI RAG Workstation | GPU VM with a Retrieval Augmented Generation reference solution | Data scientist |
| Triton Inference Server | GPU VM running NVIDIA Triton | Data scientist / MLOps |
| AI Kubernetes Cluster | VKS cluster with GPU-capable worker nodes | DevOps engineer |
| AI Kubernetes RAG Cluster | VKS cluster plus pgvector via Data Services Manager | DevOps engineer |
The prerequisites that actually gate it
The Quickstart fails politely if the foundations are not there, and every one of these is a request-time failure rather than a save-time error, so verify them first. You need vGPU-enabled VM classes defined on the Supervisor, so the platform knows which hardware profiles exist. You need Private AI images in a content library, the deep learning VM image the workstation boots from. You need an NVIDIA license reachable, because the driver and the AI Enterprise stack are licensed. And you need a namespace with actual GPU capacity assigned, because a catalog item that places onto a namespace with no free GPU will request, try, and fail.
The Kubernetes items add the VKS layer from the All Apps organization, so GPU-capable node pools and the supervisor must be ready before those items will provision. Treat the prerequisites as the real project; the wizard is the last step, not the first.
The blueprint is a starting point, not the finish
After the first run, the generated cloud templates back the catalog items, and you can modify them to fit your organization. You should. The defaults are sensible for a demo and wrong for production, because they expose more choice than you want a user to have. The most important edit is constraining the vGPU profile: turn the free-form size into a short allowlist so a user cannot request a full A100 when a fractional slice will do. Below is the shape of an AI workstation template after I have tightened it.
formatVersion: 1
inputs:
vgpuProfile:
type: string
title: vGPU profile
enum: [grid_a100-10c, grid_a100-20c] # allowlist, not free choice
default: grid_a100-10c # default to the smaller slice
framework:
type: string
title: Pre-install framework
enum: [pytorch, tensorflow, none]
default: pytorch
resources:
ai-workstation:
type: Cloud.vSphere.Machine
properties:
image: dlvm-ubuntu-2404 # from the Private AI content library
flavor: '${input.vgpuProfile}' # vGPU-enabled VM class mapping
cloudConfig: |
# bootstrap NVIDIA driver, license, and ${input.framework}
# The Quickstart generates a working template. This is what it looks
# like after pinning profiles and defaults for production.
In practice: I publish two workstation items rather than one with every option: a small fractional-vGPU item open to all, and a large full-GPU item gated behind approval and a senior role. Mixing both into a single item with a giant dropdown trains users to pick the biggest profile because they can.
Governing scarce GPUs through the catalog
This is the part the Quickstart does not do for you, and the part that decides whether GPU self-service is an asset or a recurring budget fire. Four controls, all from the governance Part, wrap a GPU catalog item. A lease policy so idle workstations are reclaimed automatically. An approval policy on the large profiles so someone signs off before a full GPU leaves the pool. A project resource limit so no single team can drain the cluster. And the profile allowlist in the blueprint so the requestable sizes are bounded in the first place.
A deep learning VM holds its vGPU profile whether the data scientist is training or on vacation. Without a lease policy, that GPU is gone from the pool until someone notices and asks. On a cluster with eight GPUs, three forgotten workstations is a 37 percent capacity loss that never shows up as an error. Lease every GPU item, with a short default and a hard total cap, and make extension a deliberate act.
vGPU, MIG or passthrough at the catalog layer
The sharing model you bake into the VM class decides the economics of the catalog item, so choose it per item, not once for the whole platform. Time-sliced vGPU oversubscribes a GPU across many light users and is right for development workstations where utilization is bursty. MIG hard-partitions a GPU into isolated slices with guaranteed performance, which suits inference and shared production where one tenant must not starve another. Passthrough gives a workload the entire physical GPU for maximum performance and no sharing, which is for heavy training that earns the whole card. Expose the cheap, shared profiles as the default items, and make passthrough the exception that needs justification.
One host, 8 physical GPUs, 30 data scientists. Expose the default AI Workstation on a time-sliced vGPU profile that splits each card into four, giving 32 logical slices, enough that everyday work never queues. Reserve 2 GPUs for an MIG-backed inference item and 1 for a passthrough training item gated behind approval.
Add a 5-day default lease with a 14-day cap on the workstation item, and a per-project limit of 8 concurrent slices. The result: bursty development shares the fractional pool, production inference is isolated, training is rationed and signed off, and idle workstations return their slices automatically. Without the lease and the limit, the same 8 GPUs serve about 8 people and then the cluster is full.
Disclaimer: Editing generated blueprints and applying leases to GPU items affects real, expensive resources. Test changes in a non-production project, confirm the namespace has free GPU capacity, and warn users before applying a lease to existing GPU deployments, since reclamation destroys the VM and frees the GPU.
Day-2 actions that keep GPUs honest
Provisioning is half the lifecycle. What a user can do to a live GPU workstation matters just as much, because the expensive resource is attached the whole time it runs. Curate the day-2 actions on these items deliberately. Power-off should be easy and encouraged, because a powered-off deep learning VM with a time-sliced profile can return its slice to the pool, so make it a one-click action and tell users to use it overnight. Resize the vGPU profile should exist but stay gated, since moving from a fractional slice to a full card is a budget decision, not a convenience. Delete should be open, because the faster a finished workstation goes away the sooner the GPU is reusable.
Pair those actions with the day-2 policies from earlier in the series. The action defines what is possible; the policy decides who may run it and whether a profile bump needs approval. The combination is what lets you hand a GPU workstation to a data scientist without handing them the keys to the whole cluster.
Allocated is not the same as used
The catalog tells you what is allocated. It does not tell you what is actually being used, and on GPUs the gap between the two is where the money leaks. A workstation can hold a full vGPU profile while its GPU utilization sits near zero for days, and from the catalog point of view everything looks healthy. Lease policies catch the worst of it by reclaiming on a timer, but a lease is a blunt instrument that fires on the clock, not on idleness.
Close the loop with utilization signals from the operations side. Watch real GPU utilization per deployment, flag the workstations that have been allocated but idle for days, and use that data to tune lease lengths and profile sizes rather than guessing. If your fractional users never exceed a quarter of their slice, your default profile is too big and you are leaving capacity on the table. Self-service GPU provisioning without a feedback loop on utilization is how a cluster stays full and underused at the same time, and the fix is to let the numbers, not the requests, decide your defaults.
Two checks before a GPU item goes live
Before a GPU catalog item reaches a tenant, confirm two things that the Quickstart will not confirm for you. First, that the backing namespace actually has free GPU capacity and the vGPU-enabled VM class the blueprint references, because a request that places onto a namespace with no free GPU fails after the user has waited, not before. Second, that a lease policy is attached, so the item cannot mint an immortal GPU deployment on its first day in production.
Prove it as a real user
Request the item once as an ordinary project member, not as an administrator, and watch the whole arc: it provisions, the vGPU attaches, the lease clock starts, and at expiry the GPU returns to the pool unaided. An item that an admin can deploy but a scoped user cannot, or one whose GPU never comes back, is not ready regardless of how clean the Quickstart looked. The end-to-end request as a real user is the only test that proves the governance binds to the people it is meant to bind to.
What I’d Do
Run the Quickstart to get working items fast, then treat its output as a draft. Pin the vGPU profiles to a short allowlist, default to the smallest sensible slice, and split workstation choice into a small open item and a large gated one. Before you publish to a single user, put a lease on every GPU item and a GPU limit on every project, because the absence of those two is what turns a GPU budget into a monthly surprise. I would not expose passthrough as a default or leave the generated template unedited, and I would not skip the prerequisite check, since vGPU classes, Private AI images, a license and free capacity are all request-time failures that look like the catalog is broken. Validate one end-to-end request as a real data scientist, watch the lease attach and the GPU return, and only then announce it.
Stand up one fractional AI Workstation item with a 5-day lease in a lab project this week and request it as an ordinary user. If the GPU comes back to the pool on expiry without anyone asking, your AI self-service is ready to grow.
References
- Broadcom TechDocs — Create AI catalog items with the Private AI Foundation Quickstart
- Broadcom TechDocs — Deploy deep learning VMs from the VCF Automation catalog
- Broadcom TechDocs — Deploy a GPU-accelerated VKS cluster from a catalog item
- VCF Blog — Install Private AI Foundation with NVIDIA using VCF Automation



