TL;DR · Key Takeaways
- Air-gapped Private AI is mostly a mirroring problem, not an AI problem. The thing that stalls builds is the bootstrap problem: you have no registry to pull your registry from.
- Private AI Services 2.1 ships the artifact mirroring tool (AMT) to replicate install artifacts into a local OCI registry, so VI admins can run model endpoints and agents fully offline.
- You mirror four buckets: vGPU drivers and the GPU Operator, NIM containers plus their model profiles, the Private AI Services components, and the Ubuntu content library images.
- In a disconnected environment the GPU Operator installs only from an OCI registry with no authentication. Design the registry around that constraint, not after it.
- Treat the mirror as a living system. Stale content and missing NIM model profiles are the failures that surface weeks later, not on day one.
Here is the scenario that catches teams off guard. You have done everything right: the GPU workload domain is healthy, vGPU profiles are assigned, the Supervisor is up, and the security team has signed off on a fully disconnected enclave. Then you try to deploy the first Supervisor Service and it sits there waiting to pull a container image that lives on the internet you just walled yourself off from. That is the bootstrap problem, and it is the real shape of air-gapped Private AI. The model serving, the RAG endpoints, the agents: those are the easy part once the platform exists. Getting the platform to exist with zero outbound connectivity is where the work actually lives.
This part walks the runbook end to end: what changes when you remove internet access, what you have to mirror, how the artifact mirroring tool fits, and how to validate the result before you hand it over. If you have not set up the GPU layer yet, start with installing the NVIDIA GPU Operator and vGPU drivers and the planning and prerequisites work first, because the air-gap version of each step assumes the connected version is already understood.
What “air-gapped” actually changes
A connected Private AI Foundation deployment quietly reaches out to four upstreams: the NVIDIA NGC catalog for containers and drivers, the NVIDIA licensing service for client tokens, public OCI registries for Supervisor Service and Carvel packages, and the internet for OS package updates inside the deep learning VMs. Cut all four and nothing is broken, exactly. It is just that every artifact now has to be present locally before the thing that needs it runs. The deployment does not fail loudly. It hangs, quietly, on the first missing pull.
So the mental model shifts from “deploy and let it fetch” to two zones with a one-way street between them. A connected staging zone (a jump host with internet access) downloads and packages everything. A transfer step moves those packages across the gap on approved media. The disconnected zone hosts a local OCI registry and a content library, and every installer inside it is pointed at those local sources instead of the internet.
The bootstrap problem nobody warns you about
Your local registry is the foundation everything else sits on. The catch is that on VCF the production-grade registry is itself a Supervisor Service, and Supervisor Services deploy by pulling container images. In an air-gapped enclave there is no registry to pull the registry from. This is the chicken-and-egg that quietly eats a day if you have not planned for it.
The clean way through is two phases. First, deploy a standalone Bitnami Harbor OVA as a throwaway bootstrap registry, pre-stage the Harbor Supervisor Service images into it (using Carvel imgpkg on the connected jump host, then copying the tar across), and register that VM as a container registry in the Supervisor. Second, deploy the real Harbor Supervisor Service by pulling from the bootstrap VM. Once the production registry is live, the bootstrap VM can be retired or kept as a cold standby. Do not skip the second phase and run on the OVA forever: the standalone appliance does not get the lifecycle integration, scaling, or support path that the Supervisor Service does.
What you actually have to mirror
This is where Private AI Services 2.1 earns its keep. It introduces the artifact mirroring tool (AMT), which replicates the Private AI Services install artifacts into your local OCI registry so a VI admin can stand up GPU-powered model endpoints and agents without touching NGC at run time. AMT handles the Private AI Services bundle. It does not magically pull every NVIDIA container you will ever need, so you still own the NIM and driver mirroring. Be honest with yourself about that scope: AMT shrinks the manual work, it does not remove it.
Four buckets cover almost every disconnected PAIF build. Mirror them in this order, because each later layer assumes the earlier one is already local.
| What to mirror | Source | Lands in | Refresh trigger |
|---|---|---|---|
| vGPU host driver + GPU Operator images | NGC, NVIDIA licensing portal | Local OCI registry (no-auth project) | Driver or NVAIE version bump |
| NIM containers + model profiles | NGC (NIM), NIM CLI download | Local OCI registry + model cache PVC | New model or GPU type added |
| Private AI Services components | AMT (artifact mirroring tool) | Local OCI registry | Private AI Services upgrade |
| Ubuntu / deep learning VM images + OS packages | Ubuntu repos, NGC DL VM image | Content library + local apt mirror | CVE patching cadence |
The two rows people underestimate are NIM model profiles and OS packages. NIM will start in an air-gapped environment, but only if the model profiles for your exact GPU were downloaded on a connected system first and staged into the model cache. Forget that and the container comes up and then refuses to serve, with an error that looks like a licensing fault but is really a missing profile. The deep learning VM side is the other quiet trap: without a local apt mirror, cloud-init inside the DL VM stalls trying to reach the Ubuntu archive, and the VM looks hung when it is just waiting on a package fetch that will never complete.
The runbook: mirror, transfer, stage, point
On the connected jump host, install Carvel imgpkg (0.47.2 at time of writing) and pull a bundle to a tar. The Harbor Supervisor Service bundle is the bootstrap example, but the same two commands move any OCI bundle:
# 1. On the connected jump host: copy the bundle to a tar
imgpkg copy -b projects.packages.broadcom.com/vsphere/supervisor/harbor-service/2.14.3/harbor:v2.14.3_vmware.2-vks.1
--to-tar harbor-v2.14.3.tar --cosign-signatures
# 2. After transfer, push the tar into the local (bootstrap) Harbor
imgpkg copy --tar harbor-v2.14.3.tar
--to-repo harbor-bootstrap.site-a.vcf.lab/supervisor-services/harbor
--cosign-signatures --registry-insecure
--registry-username admin --registry-password <password>
For the GPU layer, mirror the GPU Operator images into a dedicated project on the local registry, then deploy the operator pointed at it. The hard constraint to design around: in a disconnected environment the GPU Operator can be installed only from an OCI registry that requires no authentication. Put the operator images in their own anonymous-pull project rather than fighting imagePullSecrets on every cluster. The full connected procedure for the operator and drivers is in Part 9; the only change here is the registry path.
For NIM, use the NIM CLI on the connected host to download both the container and the model profiles for your target GPU, push the container to the local registry, and stage the profiles into the model cache that the NIM microservices layer mounts. For models served through Private AI Services, the same models live in the Model Store, backed by your local Harbor acting as the model registry. Finally, for deep learning VMs, point the content library at the local Ubuntu image and configure the DL VM to use the local apt mirror, so cloud-init completes instead of hanging.
Validate before you call it done
The failure mode of air-gapped builds is the silent one: it looks finished, then the first real workload trips on a single missing artifact. Walk the stack bottom to top and prove each layer pulls locally before you move up.
Concretely, confirm the GPU Operator and a NIM pod are actually running and that no pod is stuck in ImagePullBackOff reaching for an internet path you missed:
# Operator and driver pods should be Running, not ImagePullBackOff
kubectl get pods -n gpu-operator
# Every image reference should resolve to the local registry, never nvcr.io
kubectl get pods -A -o jsonpath='{range .items[*].spec.containers[*]}{.image}{"n"}{end}'
| sort -u | grep -v harbor-bootstrap.site-a.vcf.lab
# NIM should report the model is loaded, not waiting on a profile
kubectl logs -n nim deploy/meta-llama-3-1-8b-instruct | grep -i profile
What I’d Do
Build the four-stage loop as a script the first day, before you mirror a single byte by hand. The teams that struggle with air-gapped Private AI are not the ones who find it technically hard; they are the ones who mirror manually, lose track of what version landed where, and then spend the next quarter chasing drift between the connected staging zone and the enclave. Make the mirror reproducible, keep a signed inventory of every bundle and its hash, and schedule the refresh against the version-bump triggers in the table above rather than waiting for something to break. AMT is a real improvement and it is worth standardizing on Private AI Services 2.1 specifically for it, but plan as if you still own the NIM and OS package mirroring, because you do. Air-gapped is not harder Private AI. It is the same Private AI with the convenience of on-demand fetch removed, and discipline put in its place.
Running a disconnected enclave yourself? Tell me which layer bit you first: the Harbor bootstrap, the NIM profiles, or the deep learning VM apt mirror. That ranking says a lot about where a team’s mirroring discipline actually breaks down.
References
- Broadcom TechDocs: Upload the Private AI Services Components to a Disconnected Environment
- VCF Blog: Deploying Harbor Service in Air-Gapped VMware Cloud Foundation 9.0
- VCF Blog: Private AI Services 2.1 and the Artifact Mirroring Tool (AMT)
- Broadcom TechDocs: Provision a GPU-Accelerated VKS Cluster in a Disconnected PAIF Environment
« Previous: Part 18 | VMware Private AI Complete Guide | Next: Part 20 »



