The word “private” in Private AI is carrying a lot of weight, and most of what it carries is about data residency, not security. When a customer tells me their AI is “private now, it is all on-prem,” what they have actually bought is a promise that proprietary documents, model weights, and inference data never leave their data center. That is real, and it matters for sovereignty and compliance. But it is half of the security story, and the other half is the half that gets people breached.
This part of the series is the security and data privacy reckoning. What does VMware Private AI Foundation with NVIDIA actually protect, what does it not protect by default, and which controls in VCF 9.1 close the gap. I will be blunt about where the marketing and the field diverge.
What “Private” Actually Buys You
Enterprises move AI workloads onto private cloud for four concrete reasons: data sovereignty, regulatory compliance, intellectual property protection, and the assurance that corporate data and model weights stay under their control. PAIF delivers on that. Your RAG corpus, your fine-tuned weights, your prompt and inference traffic all stay inside the VCF perimeter. The air-gapped option (covered in Part 19) takes that to its limit by removing the internet entirely.
So far so good. The trap is treating “the data does not leave the building” as if it were the same statement as “the data is secure.” It is not. On-premises is a location, not an architecture. Here is the mental model I draw on a whiteboard in every Private AI security workshop.
The Half On-Prem Does Not Solve
A production RAG pipeline on PAIF is not one container. It is a dozen of them. The NVIDIA NIM RAG Blueprint, for example, runs seven NIM microservices (an LLM, an embedding model, a re-ranker, and four document extraction models), plus an orchestration server, an ingestor, Redis, MinIO object storage, and a vector database. In default Kubernetes networking, once traffic is inside the cluster, every pod can reach every other pod on any port.
Think about what that means for the assets that matter. The vector database holds your entire corporate knowledge base as embeddings. MinIO holds the raw uploaded documents. The LLM serves proprietary inferences. With a flat network, a single compromised component has a direct path to all of them. And AI pipelines hand you compromise vectors that traditional apps do not: a supply chain vulnerability buried in a model dependency, a prompt injection that turns your own LLM into the attacker’s tool, or a container escape from a model server that has direct access to GPU memory.
Here is the scenario that should keep you up at night. An attacker lands a prompt injection on the LLM. The LLM generates text, that is all it should ever do. But on a flat network there is nothing stopping a compromised LLM pod from opening a TCP connection straight to the vector database on port 9200 and reading out the whole corpus. Your “private” data never left the building. It just got exfiltrated to an attacker who is now inside it.
Three States of Data, Three Different Controls
Data privacy is not one control. It is three, because your data exists in three states and each needs its own protection. Most teams cover the first two and quietly skip the third, which is exactly the one AI workloads expose the most.
Data at rest is the easy one: vSAN Data-at-Rest Encryption, enabled per cluster, and in VCF 9.1 it now coexists with vSAN global deduplication without a tradeoff. Data in motion is mostly default behavior: encrypted vMotion (hardware-accelerated with Intel QAT in 9.1), TLS on service endpoints, and Geneve-tunneled pod traffic.
Data in use is the interesting one. This is the data sitting decrypted in memory while the model runs on it, and historically it was a blind spot. In VCF 9.1, Confidential Computing went generally available for the current generation of Intel TDX and AMD SEV-SNP. It runs sensitive workloads inside hardware-encrypted memory regions with per-VM keys, and those regions are inaccessible even to the hypervisor, with workload and host attestation to prove it. VCF Operations now profiles your ESX fleet and tells you which hosts can actually run confidential VMs, which removes a real planning headache.
On the GPU side, NVIDIA Hopper (H100, H200) and Blackwell (B200, GB200) ship a Confidential Computing mode with a dedicated engine on the die that encrypts every write to HBM with AES-256-GCM, using a key that never leaves the chip. The published overhead is roughly 2 to 5 percent for most LLM inference, plus a one-time attestation cost of 1 to 3 seconds at startup. That is a small price for data-in-use protection on the GPU.
Here is my field caveat, and it is the part the datasheets gloss over. GPU Confidential Computing mode is built around a full, dedicated GPU in a confidential VM. It does not currently combine cleanly with time-sliced vGPU profiles, the very partitioning model most PAIF deployments rely on to share a GPU across tenants (see Part 6 on vGPU, MIG and passthrough). So “data-in-use encryption on the GPU” is not a checkbox you get for free on a shared L40S running four vGPU slices. If a workload truly needs confidential GPU compute, plan for passthrough or a dedicated GPU and validate the exact host, GPU, and driver combination against the BOM before you promise it to a regulator.
| Asset | What is at risk | Control in PAIF / VCF 9.1 |
|---|---|---|
| Vector DB / RAG corpus | Bulk exfiltration of the knowledge base | vDefend microsegmentation, egress control, vSAN encryption |
| Model weights | IP theft, tampered / poisoned model | Air-gap, Harbor RBAC, OpenSSF model signing |
| Prompt & inference data | Leakage of regulated data in memory | Confidential Computing (TDX / SEV-SNP), GPU TEE |
| Uploaded documents (MinIO) | Unauthorized read by other pods | Least-privilege DFW rules, default-deny |
| Control plane / orchestration | Privilege escalation, tenant crossover | Organizations, SSO via Identity Broker, audit trail |
Microsegmentation Is the Second Half
If you take one thing from this post: on PAIF, zero-trust microsegmentation is not an optional hardening step, it is the control that makes “private” mean something inside the cluster. VMware vDefend Distributed Firewall, enforced through the Antrea CNI, lets you define policy centrally in NSX Manager and have it realized as Antrea Cluster Network Policies on the GPU Kubernetes cluster.
Two design properties make this work for AI workloads specifically. First, the security groups are identity-based: they match pods by namespace and Kubernetes labels, not by IP address. That matters because NIM pods get new IPs constantly as they are rescheduled during model updates or GPU rebalancing, and an IP-based rule would break the moment a pod moved. Second, enforcement happens at the OVS bridge on the pod’s virtual interface, the very first network hop. Unauthorized traffic is dropped before it ever touches the wire, which is fundamentally different from a perimeter firewall that only sees traffic after it has already traversed the network.
You map every legitimate path (RAG Server to LLM, RAG Server to vector DB, Ingestor to object storage, and so on), allow exactly those, and add a default-deny rule that drops and logs everything else. The result is the diagram above: a prompt-injected LLM trying to reach the vector store hits the catch-all deny and the packet dies at the first hop. The whole policy is written as Terraform and version-controlled alongside the application, so security is provisioned with the workload from the self-service catalog, not bolted on after the fact.
The subtlety worth knowing: when in-cluster egress traffic leaves the VKS cluster, it normally gets Source-NAT’d to the VPC’s public IP, and pod identity is lost at that boundary. Antrea Egress with a dedicated routable IP preserves that identity across the NSX fabric, so the external NSX DFW can enforce the same zero-trust rule a second time. Two independent enforcement points: a misconfiguration of one does not collapse the other.
Provenance, Identity, and the Platform Underneath
Two more privacy concerns that are easy to forget. The first is model provenance. A poisoned or tampered model is a data integrity problem as much as a security one. Since March 2025, NVIDIA signs every model it publishes in the NGC catalog using the OpenSSF Model Signing specification, so you can verify origin and integrity independently before a model ever runs against your data. Pair that with Harbor’s role-based access control on the container registry, and you have a chain of custody for what gets deployed.
The second is tenant isolation. PAIF supports multitenancy through organizations, each operating in a dedicated, isolated environment, and the Model Runtime service can securely share models across tenants while keeping each tenant’s data private. If you run AI as an internal service for multiple business units (or as a provider), that isolation boundary is doing real privacy work and deserves the same scrutiny as your network policy. The component architecture in Part 2 shows where these pieces sit.
Underneath all of it, VCF 9.1 hardened the platform itself in ways that matter for AI uptime and forensics. The User-Level Monitor moves the virtual machine monitor to lower-privilege user mode, shrinking the blast radius of a hypervisor exploit. File Integrity Monitoring (on by default, every four hours, aligned to NIST and PCI DSS) catches tampering with installed binaries. Live patching now works on TPM-enabled hosts, so you can patch inference services without draining them. And the new centralized Audit Trail in VCF Operations gives you a time-sliced view across the stack, including VKS, so when a firewall rule changes or a login fails you can trace the full event chain. For regulated AI, that audit trail is not a nice-to-have, it is the evidence.
What I’d Do
Stop selling Private AI internally as “secure because it is on-prem.” That framing sets you up to skip the controls that actually matter. Treat data residency and data security as two separate work streams. Residency you get largely for free with PAIF and the air-gap option. Security you have to design: microsegment every pipeline with vDefend and a default-deny baseline, decide deliberately which workloads need confidential computing for data in use (and accept the GPU passthrough tradeoff that comes with it), verify model signatures before deployment, and wire the audit trail into your SIEM from day one.
If I had to rank effort against risk reduction, microsegmentation is the single highest-value control on this list, and it is the one most teams have not turned on. Start there. The pillar page collects the full series: the VMware Private AI complete guide. Which of these controls is already live in your environment, and which one have you been quietly deferring?
References
- Strengthen Zero Trust Security and Resilience with VCF 9.1 (VMware Cloud Foundation Blog)
- A Hands-On Guide to Secure Private AI with Broadcom, Part 2: vDefend microsegmentation
- NVIDIA Confidential Computing for Hopper and Blackwell GPUs
- VMware Private AI Foundation with NVIDIA 9.1 (Broadcom TechDocs)
- Bringing Verifiable Trust to AI Models: Model Signing in NGC (NVIDIA)
« Previous: Part 19 | VMware Private AI Complete Guide | Next: Part 21 »



