Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

Air-Gapped VMware Private AI Foundation: Mirroring, AMT and the Bootstrap Problem (Private AI Series, Part 19)

Deploying VMware Private AI Foundation in a fully disconnected enclave: what to mirror, how the artifact mirroring tool (AMT) fits, the Harbor bootstrap problem, and how to validate offline NIM and GPU before handover.

VMware Private AI Series · Part 19 of 24

TL;DR · Key Takeaways

  • Air-gapped Private AI is mostly a mirroring problem, not an AI problem. The thing that stalls builds is the bootstrap problem: you have no registry to pull your registry from.
  • Private AI Services 2.1 ships the artifact mirroring tool (AMT) to replicate install artifacts into a local OCI registry, so VI admins can run model endpoints and agents fully offline.
  • You mirror four buckets: vGPU drivers and the GPU Operator, NIM containers plus their model profiles, the Private AI Services components, and the Ubuntu content library images.
  • In a disconnected environment the GPU Operator installs only from an OCI registry with no authentication. Design the registry around that constraint, not after it.
  • Treat the mirror as a living system. Stale content and missing NIM model profiles are the failures that surface weeks later, not on day one.
Who this is for: VCF architects and platform admins deploying Private AI Foundation where the workload domain has no internet egress (government, defense, finance, healthcare, sovereign cloud).  Prerequisites: VCF 9.0 or 9.1 with a GPU workload domain and Supervisor enabled, the PAIF add-on plus NVIDIA AI Enterprise entitlement, an internet-connected jump host, and a sanctioned transfer path across the gap.

Here is the scenario that catches teams off guard. You have done everything right: the GPU workload domain is healthy, vGPU profiles are assigned, the Supervisor is up, and the security team has signed off on a fully disconnected enclave. Then you try to deploy the first Supervisor Service and it sits there waiting to pull a container image that lives on the internet you just walled yourself off from. That is the bootstrap problem, and it is the real shape of air-gapped Private AI. The model serving, the RAG endpoints, the agents: those are the easy part once the platform exists. Getting the platform to exist with zero outbound connectivity is where the work actually lives.

This part walks the runbook end to end: what changes when you remove internet access, what you have to mirror, how the artifact mirroring tool fits, and how to validate the result before you hand it over. If you have not set up the GPU layer yet, start with installing the NVIDIA GPU Operator and vGPU drivers and the planning and prerequisites work first, because the air-gap version of each step assumes the connected version is already understood.

What “air-gapped” actually changes

A connected Private AI Foundation deployment quietly reaches out to four upstreams: the NVIDIA NGC catalog for containers and drivers, the NVIDIA licensing service for client tokens, public OCI registries for Supervisor Service and Carvel packages, and the internet for OS package updates inside the deep learning VMs. Cut all four and nothing is broken, exactly. It is just that every artifact now has to be present locally before the thing that needs it runs. The deployment does not fail loudly. It hangs, quietly, on the first missing pull.

So the mental model shifts from “deploy and let it fetch” to two zones with a one-way street between them. A connected staging zone (a jump host with internet access) downloads and packages everything. A transfer step moves those packages across the gap on approved media. The disconnected zone hosts a local OCI registry and a content library, and every installer inside it is pointed at those local sources instead of the internet.

Two zones, one-way transfer Nothing in the disconnected zone ever reaches the internet CONNECTED STAGING ZONE NGC / upstream OCI NIM, drivers, packages Jump host AMT, imgpkg, NIM CLI Tar / OCI bundles signed, inventoried AIR GAP approved media DISCONNECTED ZONE (VCF workload domain) Local OCI registry Harbor Content library Ubuntu / DL VM images Supervisor + VKS clusters GPU Operator, NIM Operator pull locally Model endpoints, RAG, agents served entirely offline
The air-gap model: download and package on a connected jump host, transfer across the gap, point every installer at local sources.

The bootstrap problem nobody warns you about

Your local registry is the foundation everything else sits on. The catch is that on VCF the production-grade registry is itself a Supervisor Service, and Supervisor Services deploy by pulling container images. In an air-gapped enclave there is no registry to pull the registry from. This is the chicken-and-egg that quietly eats a day if you have not planned for it.

The clean way through is two phases. First, deploy a standalone Bitnami Harbor OVA as a throwaway bootstrap registry, pre-stage the Harbor Supervisor Service images into it (using Carvel imgpkg on the connected jump host, then copying the tar across), and register that VM as a container registry in the Supervisor. Second, deploy the real Harbor Supervisor Service by pulling from the bootstrap VM. Once the production registry is live, the bootstrap VM can be retired or kept as a cold standby. Do not skip the second phase and run on the OVA forever: the standalone appliance does not get the lifecycle integration, scaling, or support path that the Supervisor Service does.

Breaking the registry chicken-and-egg Phase 1 bootstraps; Phase 2 becomes production 1 Bitnami Harbor OVA Standalone VM, deployed from OVF. Throwaway bootstrap registry that holds the Harbor images. 2 Pre-stage images imgpkg copy the Harbor Supervisor Service bundle into the OVA, add it as a Supervisor registry. 3 Harbor Service Production Supervisor Service registry. Lifecycle managed and supported. Retire the OVA after.
The two-phase Harbor pattern that breaks the bootstrap deadlock in a disconnected enclave.

What you actually have to mirror

This is where Private AI Services 2.1 earns its keep. It introduces the artifact mirroring tool (AMT), which replicates the Private AI Services install artifacts into your local OCI registry so a VI admin can stand up GPU-powered model endpoints and agents without touching NGC at run time. AMT handles the Private AI Services bundle. It does not magically pull every NVIDIA container you will ever need, so you still own the NIM and driver mirroring. Be honest with yourself about that scope: AMT shrinks the manual work, it does not remove it.

Four buckets cover almost every disconnected PAIF build. Mirror them in this order, because each later layer assumes the earlier one is already local.

What to mirrorSourceLands inRefresh trigger
vGPU host driver + GPU Operator imagesNGC, NVIDIA licensing portalLocal OCI registry (no-auth project)Driver or NVAIE version bump
NIM containers + model profilesNGC (NIM), NIM CLI downloadLocal OCI registry + model cache PVCNew model or GPU type added
Private AI Services componentsAMT (artifact mirroring tool)Local OCI registryPrivate AI Services upgrade
Ubuntu / deep learning VM images + OS packagesUbuntu repos, NGC DL VM imageContent library + local apt mirrorCVE patching cadence

The two rows people underestimate are NIM model profiles and OS packages. NIM will start in an air-gapped environment, but only if the model profiles for your exact GPU were downloaded on a connected system first and staged into the model cache. Forget that and the container comes up and then refuses to serve, with an error that looks like a licensing fault but is really a missing profile. The deep learning VM side is the other quiet trap: without a local apt mirror, cloud-init inside the DL VM stalls trying to reach the Ubuntu archive, and the VM looks hung when it is just waiting on a package fetch that will never complete.


The runbook: mirror, transfer, stage, point

The repeatable four-stage loop 1 Download AMT, imgpkg, NIM CLI pull from NGC + OCI 2 Package copy to signed tar inventory each bundle 3 Transfer approved media across the gap, verify hashes 4 Stage + point push to local registry, rewrite installer URLs
Every artifact follows the same loop. Build the loop once and reuse it for drivers, NIM, and upgrades.

On the connected jump host, install Carvel imgpkg (0.47.2 at time of writing) and pull a bundle to a tar. The Harbor Supervisor Service bundle is the bootstrap example, but the same two commands move any OCI bundle:

# 1. On the connected jump host: copy the bundle to a tar
imgpkg copy -b projects.packages.broadcom.com/vsphere/supervisor/harbor-service/2.14.3/harbor:v2.14.3_vmware.2-vks.1 
  --to-tar harbor-v2.14.3.tar --cosign-signatures

# 2. After transfer, push the tar into the local (bootstrap) Harbor
imgpkg copy --tar harbor-v2.14.3.tar 
  --to-repo harbor-bootstrap.site-a.vcf.lab/supervisor-services/harbor 
  --cosign-signatures --registry-insecure 
  --registry-username admin --registry-password <password>

For the GPU layer, mirror the GPU Operator images into a dedicated project on the local registry, then deploy the operator pointed at it. The hard constraint to design around: in a disconnected environment the GPU Operator can be installed only from an OCI registry that requires no authentication. Put the operator images in their own anonymous-pull project rather than fighting imagePullSecrets on every cluster. The full connected procedure for the operator and drivers is in Part 9; the only change here is the registry path.

For NIM, use the NIM CLI on the connected host to download both the container and the model profiles for your target GPU, push the container to the local registry, and stage the profiles into the model cache that the NIM microservices layer mounts. For models served through Private AI Services, the same models live in the Model Store, backed by your local Harbor acting as the model registry. Finally, for deep learning VMs, point the content library at the local Ubuntu image and configure the DL VM to use the local apt mirror, so cloud-init completes instead of hanging.

Validate before you call it done

The failure mode of air-gapped builds is the silent one: it looks finished, then the first real workload trips on a single missing artifact. Walk the stack bottom to top and prove each layer pulls locally before you move up.

Bottom-up validation gate Local registry resolves + anonymous pull works? GPU Operator pods Running, nodes labelled? NIM model profile present in the cache? Endpoint answers a test prompt? If any gate fails Do not move up the stack. The fix is almost always a missing or stale mirror, not a runtime bug. Re-run the four-stage loop for that one artifact. All green: hand it over
Prove each layer pulls locally before climbing to the next. A failure here is a mirror gap, not a runtime bug.

Concretely, confirm the GPU Operator and a NIM pod are actually running and that no pod is stuck in ImagePullBackOff reaching for an internet path you missed:

# Operator and driver pods should be Running, not ImagePullBackOff
kubectl get pods -n gpu-operator

# Every image reference should resolve to the local registry, never nvcr.io
kubectl get pods -A -o jsonpath='{range .items[*].spec.containers[*]}{.image}{"n"}{end}' 
  | sort -u | grep -v harbor-bootstrap.site-a.vcf.lab

# NIM should report the model is loaded, not waiting on a profile
kubectl logs -n nim deploy/meta-llama-3-1-8b-instruct | grep -i profile
Disclaimer: This is a production-change procedure. Validate the target BOM and check driver, GPU Operator, and NVAIE interoperability against the support matrix before you mirror; back up the registry and content library state; run the bottom-up validation on a non-production namespace first; and confirm your transfer media and signing process meet the enclave’s compliance controls. Air gaps exist for a reason, so treat every artifact you carry in as something that has to be inventoried and signed.

What I’d Do

Build the four-stage loop as a script the first day, before you mirror a single byte by hand. The teams that struggle with air-gapped Private AI are not the ones who find it technically hard; they are the ones who mirror manually, lose track of what version landed where, and then spend the next quarter chasing drift between the connected staging zone and the enclave. Make the mirror reproducible, keep a signed inventory of every bundle and its hash, and schedule the refresh against the version-bump triggers in the table above rather than waiting for something to break. AMT is a real improvement and it is worth standardizing on Private AI Services 2.1 specifically for it, but plan as if you still own the NIM and OS package mirroring, because you do. Air-gapped is not harder Private AI. It is the same Private AI with the convenience of on-demand fetch removed, and discipline put in its place.

Running a disconnected enclave yourself? Tell me which layer bit you first: the Harbor bootstrap, the NIM profiles, or the deep learning VM apt mirror. That ranking says a lot about where a team’s mirroring discipline actually breaks down.


References

VMware Private AI Series · Part 19 of 30
« Previous: Part 18  |  VMware Private AI Complete Guide  |  Next: Part 20 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading