Governing Private AI Consumption in VCF Automation: GPU Quotas, Policies and Roles (VCF Automation 9 Series, Part 35)

AI Stack, Automation, VCF, VMware & Cloud

Governing Private AI Consumption in VCF Automation: GPU Quotas, Policies and Roles (VCF Automation 9 Series, Part 35)

GPU is the most expensive resource you will ever put behind self-service. Here is how to govern Private AI consumption in VCF Automation 9.1 with namespace GPU quotas, consumption policies, roles and per-namespace activation, without touching the AI stack itself.

Dr. Pranay Jha

June 22, 2026

No comments

9 minutes

Read Time

VCF Automation 9 Series · Part 35 of 41

TL;DR · Key Takeaways

This Part is about governing Private AI consumption, not the AI stack. The components and catalog items live in Part 27 and the Private AI Series.
GPU is governed like any scarce resource: a quota on the organization namespace, expressed as a device-plugin resource such as nvidia.com/gpu.
Org admins control who gets GPU and how much through consumption policies, templates and roles, and they activate Private AI Services per namespace, not everywhere.
In 9.1, providers grant quota across multiple supervisors in a region, and org admins delegate namespace creation to project admins and platform engineers within guardrails.
My recommendation: cap GPU per namespace, offer a short list of approved vGPU and MIG profiles as templates, and make idle GPU visible. An ungoverned GPU pool is the fastest way to a budget overrun and a queue.

Who this is for: provider and organization admins who run VCF Automation for AI tenants and have to keep GPU consumption controlled, fair and within budget.

Prerequisites: VCF Automation 9.x with Private AI Foundation in play, GPU hosts in the estate, organizations and projects defined, and the rights to set quotas, policies and roles. For the AI components themselves, see the Private AI Series.

A data science team asks for a few GPUs to experiment. Six weeks later you find they are holding eight high-end GPUs at four percent utilization, and another team that needs two cannot get any. GPU is the most expensive resource you will ever put behind self-service, and the one most likely to be quietly hoarded. This Part is about governing that consumption in VCF Automation: the quotas, policies and roles that keep GPU honest. It is deliberately not about the AI components, which the Private AI Series covers, or the GPU catalog items, which Part 27 covers. It is about the guardrails around them.

The governance stack for GPU

GPU governance is not one control, it is a stack, and each layer answers a different question. The provider decides how much GPU capacity an organization gets across a region. The organization carves that into namespace quotas that cap each team. A consumption policy decides who may request GPU-backed items at all, and roles decide who can do what. Activation decides which namespaces even have Private AI Services switched on. Skip a layer and you get the failure mode above: capacity handed out with no ceiling and no accountability.

Five layers, top to bottom. The provider sets the outer bound; activation sets the inner one.

GPU as a quota, not a hope

The concrete control is the namespace quota. GPU surfaces to the platform as a device-plugin resource, so it is governed the same way you govern CPU and memory: a hard limit on the namespace. The org namespace gets a GPU ceiling, and requests beyond it are simply refused. That single number is what turns GPU from an open bar into a budget.

# GPU is governed as a hard quota on the AI tenant namespace
kubectl --context ai-tenant-ns describe resourcequota

# Name:                  ai-tenant-quota
# Resource               Used   Hard
# --------               ----   ----
# limits.nvidia.com/gpu  3      4
# requests.cpu           12     32
# requests.memory        96Gi   256Gi

Expected result: the team can hold at most four GPUs in this namespace, and the fifth request is rejected at admission, not after the bill arrives. Note that partitioned GPUs change the resource name: a full GPU is nvidia.com/gpu, while MIG slices surface as distinct names like nvidia.com/mig-1g.10gb. Decide on a partitioning strategy first, because the quota you write depends on it. The vGPU and MIG mechanics themselves are covered in the Private AI Series.

In practice: set the GPU quota lower than you think you need and raise it on request. Raising a quota is a two-minute change with a paper trail. Clawing back GPUs a team has grown attached to is a political negotiation. Start tight.

Policies, templates and who can ask

A quota caps the total; a consumption policy decides who may draw on it and what they may draw. Organization admins enforce control over GPUs through consumption policies, templates and defined user roles. Templates are the underused lever: instead of letting anyone request an arbitrary GPU shape, you publish a short list of approved profiles, a small inference slice, a single training GPU, a multi-GPU node, and consumers pick from those. That keeps requests standard, costable and supportable, and it stops the one-off configurations that become support burdens later. Tie GPU-backed items to the same approval and lease policies you use elsewhere, so an expensive GPU deployment cannot exist without an owner and an expiry.

Lever	What it controls	Who sets it
Region GPU capacity	Total GPU an org can use across supervisors	Provider admin
Namespace GPU quota	The GPU ceiling per team	Org / project admin
Consumption policy	Who may request GPU items and how much	Org admin
Templates	The approved GPU profiles on offer	Org admin / provider
Roles	Who creates namespaces, requests, approves	Org admin (delegated)

Roles and delegation

Roles decide who can pull each lever. In 9.1, an org admin can delegate namespace creation to project admins and platform engineers within defined guardrails, which removes the bottleneck of every namespace going through one team while keeping the limits enforced. The pattern that works: the org admin owns GPU quota and consumption policy, delegates namespace creation and day-to-day sizing to project admins, and leaves GPU requests to the consumers within the published templates. Each role can do its job without holding a key to the whole budget.

Delegate creation and sizing downward; keep quota, policy and templates with the org admin.

Activation, and provider-side capacity in 9.1

Private AI Services are activated per organization namespace, not switched on everywhere by default. That is a governance feature, not an inconvenience: a namespace without Private AI Services activated cannot consume the AI catalog at all, which gives you a clean, explicit boundary around who is in the AI program. Activate it for the namespaces that need it and leave the rest alone.

On the provider side, 9.1 makes the capacity math easier. A provider can grant an organization quota across multiple supervisors in a region, and share a region's capacity across supervisors for several organizations, rather than pinning each org to one supervisor. For GPU, where the hardware is lumpy and expensive, that flexibility is the difference between stranded GPUs sitting idle in the wrong supervisor and a pool that several teams can actually reach.

Policy, then quota, then approval and lease. A GPU only attaches after all three pass.

Worked example

A region has 16 GPUs. You grant the AI organization a 12-GPU quota and keep 4 in reserve for bursts and failures. Inside the org, three teams get namespace quotas of 4, 4 and 2, leaving 2 unallocated for ad hoc requests. You publish three templates: a 1-slice MIG inference profile, a single full GPU for fine-tuning, and a 4-GPU training node, each with a default 14-day lease. Consumers self-serve within those limits. When a team needs more, they ask, you see the current utilization before deciding, and you raise a quota by a known amount rather than discovering the overcommit after the fact. Idle GPUs show up against the quota, so the four-percent-utilization hoard becomes a visible number instead of a surprise.

Disclaimer

GPU quota, consumption policy and Private AI Services activation are governance changes that affect what AI tenants can consume and spend. Validate in a non-production organization, confirm your vGPU or MIG partitioning strategy before writing quotas, and follow the current Broadcom documentation for Private AI Foundation and your exact VCF Automation build.

My Recommendation

Govern GPU before you publish a single AI catalog item. Set a namespace GPU quota lower than the ask, offer a short list of approved vGPU and MIG profiles as templates, activate Private AI Services only where it is needed, and put GPU-backed deployments under the same approval and lease policies as everything else. The reason is simple economics: GPU is scarce and costly, and an ungoverned pool turns into both a budget overrun and a queue at the same time, because the teams that grabbed early hold capacity the teams that need it cannot reach. When would I loosen this? For a dedicated research program with its own funded GPU pool and accountable owner, give them a large quota and get out of the way. For shared infrastructure, keep it tight and make idle GPU visible. The components and catalog belong to Part 27 and the Private AI Series; the discipline that keeps them affordable is yours. What is your GPU utilization across teams right now, and could you prove it?

References

VCF Automation 9 Series · Part 35 of 41
« Previous: Part 34 | VCF Automation Guide | Next: Part 36 »

About The Author

Dr. Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

See author's posts

Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Tags: governance, GPU, private AI, VCF, VCF Automation, VCF Automation 9 Series

June 22, 2026

Architect’s Toolkit

About the Author

Dr Pranay Jha

You May Have Missed

View All

AI Stack, AI/ML, VMware & Cloud

Running NVIDIA AI On-Prem and on VCF: Cost, Trade-offs and the Verdict (NVIDIA AI Series, Part 30)

June 23, 2026
AI Stack, AI/ML

GPU Observability and Multi-Tenancy: DCGM, Honest Utilization, and Sharing (NVIDIA AI Series, Part 29)

June 23, 2026
AI Stack, AI/ML

NVIDIA Blueprints and Agentic AI: AI-Q and the NeMo Agent Toolkit (NVIDIA AI Series, Part 28)

June 23, 2026
AI Stack, AI/ML

The NVIDIA NeMo Framework: Training and Fine-Tuning at Scale (NVIDIA AI Series, Part 22)

June 23, 2026
AI Stack, AI/ML

NVIDIA NeMo Retriever: RAG with Embeddings, Reranking and Guardrails (NVIDIA AI Series, Part 27)

June 23, 2026

Dr. Pranay Jha

Governing Private AI Consumption in VCF Automation: GPU Quotas, Policies and Roles (VCF Automation 9 Series, Part 35)

The governance stack for GPU

GPU as a quota, not a hope

Policies, templates and who can ask

Roles and delegation

Activation, and provider-side capacity in 9.1

My Recommendation

References

About The Author

Dr. Pranay Jha

Discover more from Dr. Pranay Jha

Leave a Reply Cancel reply

Architect’s Toolkit

VMware Cloud Foundation

Nutanix

AI & Cloud-Native Platform

Architecture & Design

About the Author

Dr Pranay Jha

You May Have Missed

Running NVIDIA AI On-Prem and on VCF: Cost, Trade-offs and the Verdict (NVIDIA AI Series, Part 30)

GPU Observability and Multi-Tenancy: DCGM, Honest Utilization, and Sharing (NVIDIA AI Series, Part 29)

NVIDIA Blueprints and Agentic AI: AI-Q and the NeMo Agent Toolkit (NVIDIA AI Series, Part 28)

The NVIDIA NeMo Framework: Training and Fine-Tuning at Scale (NVIDIA AI Series, Part 22)

NVIDIA NeMo Retriever: RAG with Embeddings, Reranking and Guardrails (NVIDIA AI Series, Part 27)