- This Part is about governing Private AI consumption, not the AI stack. The components and catalog items live in Part 27 and the Private AI Series.
- GPU is governed like any scarce resource: a quota on the organization namespace, expressed as a device-plugin resource such as
nvidia.com/gpu. - Org admins control who gets GPU and how much through consumption policies, templates and roles, and they activate Private AI Services per namespace, not everywhere.
- In 9.1, providers grant quota across multiple supervisors in a region, and org admins delegate namespace creation to project admins and platform engineers within guardrails.
- My recommendation: cap GPU per namespace, offer a short list of approved vGPU and MIG profiles as templates, and make idle GPU visible. An ungoverned GPU pool is the fastest way to a budget overrun and a queue.
A data science team asks for a few GPUs to experiment. Six weeks later you find they are holding eight high-end GPUs at four percent utilization, and another team that needs two cannot get any. GPU is the most expensive resource you will ever put behind self-service, and the one most likely to be quietly hoarded. This Part is about governing that consumption in VCF Automation: the quotas, policies and roles that keep GPU honest. It is deliberately not about the AI components, which the Private AI Series covers, or the GPU catalog items, which Part 27 covers. It is about the guardrails around them.
The governance stack for GPU
GPU governance is not one control, it is a stack, and each layer answers a different question. The provider decides how much GPU capacity an organization gets across a region. The organization carves that into namespace quotas that cap each team. A consumption policy decides who may request GPU-backed items at all, and roles decide who can do what. Activation decides which namespaces even have Private AI Services switched on. Skip a layer and you get the failure mode above: capacity handed out with no ceiling and no accountability.
GPU as a quota, not a hope
The concrete control is the namespace quota. GPU surfaces to the platform as a device-plugin resource, so it is governed the same way you govern CPU and memory: a hard limit on the namespace. The org namespace gets a GPU ceiling, and requests beyond it are simply refused. That single number is what turns GPU from an open bar into a budget.
# GPU is governed as a hard quota on the AI tenant namespace
kubectl --context ai-tenant-ns describe resourcequota
# Name: ai-tenant-quota
# Resource Used Hard
# -------- ---- ----
# limits.nvidia.com/gpu 3 4
# requests.cpu 12 32
# requests.memory 96Gi 256Gi
Expected result: the team can hold at most four GPUs in this namespace, and the fifth request is rejected at admission, not after the bill arrives. Note that partitioned GPUs change the resource name: a full GPU is nvidia.com/gpu, while MIG slices surface as distinct names like nvidia.com/mig-1g.10gb. Decide on a partitioning strategy first, because the quota you write depends on it. The vGPU and MIG mechanics themselves are covered in the Private AI Series.
Policies, templates and who can ask
A quota caps the total; a consumption policy decides who may draw on it and what they may draw. Organization admins enforce control over GPUs through consumption policies, templates and defined user roles. Templates are the underused lever: instead of letting anyone request an arbitrary GPU shape, you publish a short list of approved profiles, a small inference slice, a single training GPU, a multi-GPU node, and consumers pick from those. That keeps requests standard, costable and supportable, and it stops the one-off configurations that become support burdens later. Tie GPU-backed items to the same approval and lease policies you use elsewhere, so an expensive GPU deployment cannot exist without an owner and an expiry.
| Lever | What it controls | Who sets it |
|---|---|---|
| Region GPU capacity | Total GPU an org can use across supervisors | Provider admin |
| Namespace GPU quota | The GPU ceiling per team | Org / project admin |
| Consumption policy | Who may request GPU items and how much | Org admin |
| Templates | The approved GPU profiles on offer | Org admin / provider |
| Roles | Who creates namespaces, requests, approves | Org admin (delegated) |
Roles and delegation
Roles decide who can pull each lever. In 9.1, an org admin can delegate namespace creation to project admins and platform engineers within defined guardrails, which removes the bottleneck of every namespace going through one team while keeping the limits enforced. The pattern that works: the org admin owns GPU quota and consumption policy, delegates namespace creation and day-to-day sizing to project admins, and leaves GPU requests to the consumers within the published templates. Each role can do its job without holding a key to the whole budget.
Activation, and provider-side capacity in 9.1
Private AI Services are activated per organization namespace, not switched on everywhere by default. That is a governance feature, not an inconvenience: a namespace without Private AI Services activated cannot consume the AI catalog at all, which gives you a clean, explicit boundary around who is in the AI program. Activate it for the namespaces that need it and leave the rest alone.
On the provider side, 9.1 makes the capacity math easier. A provider can grant an organization quota across multiple supervisors in a region, and share a region's capacity across supervisors for several organizations, rather than pinning each org to one supervisor. For GPU, where the hardware is lumpy and expensive, that flexibility is the difference between stranded GPUs sitting idle in the wrong supervisor and a pool that several teams can actually reach.
My Recommendation
Govern GPU before you publish a single AI catalog item. Set a namespace GPU quota lower than the ask, offer a short list of approved vGPU and MIG profiles as templates, activate Private AI Services only where it is needed, and put GPU-backed deployments under the same approval and lease policies as everything else. The reason is simple economics: GPU is scarce and costly, and an ungoverned pool turns into both a budget overrun and a queue at the same time, because the teams that grabbed early hold capacity the teams that need it cannot reach. When would I loosen this? For a dedicated research program with its own funded GPU pool and accountable owner, give them a large quota and get out of the way. For shared infrastructure, keep it tight and make idle GPU visible. The components and catalog belong to Part 27 and the Private AI Series; the discipline that keeps them affordable is yours. What is your GPU utilization across teams right now, and could you prove it?
References
- VCF 9.1: Private Cloud Platform for Production AI (Broadcom)
- Activate Private AI Services in an Organization Namespace (Broadcom TechDocs)
- Set Up VCF Automation for Private AI Foundation with NVIDIA (Broadcom TechDocs)
- Self-service private cloud with VCF 9.1 (Broadcom)



