The first thing that goes wrong when you cut internet access to an NVIDIA AI cluster is not the model inference. It is the GPU Operator driver container, quietly failing to download kernel headers from an Ubuntu mirror that no longer exists on the network. The second thing that goes wrong is the Helm chart referencing nvcr.io/nvidia/gpu-operator:v26.3.2 and the kubelet timing out on an image pull. Both failures are entirely preventable, but only if you mirror the right artifacts before you air-gap, and only if you know exactly which images the Operator expects and which OS packages the driver container needs at runtime.
The Bootstrap Problem in Air-Gapped Environments
Every NVIDIA container image pulls from nvcr.io. Every GPU Operator driver container also downloads OS packages (kernel headers, GCC) from the distribution mirror at install time. That second dependency is the one operators miss. You can mirror every container image faithfully and still have the driver installation fail because the driver init container tries to reach archive.ubuntu.com or the Red Hat CDN and gets connection-refused.
There are two ways to solve it. The cleaner one: use precompiled driver containers, which NVIDIA ships for supported kernel versions and which contain all compiled artifacts baked in. No runtime package download. The more flexible one: mirror the OS package repository, create a ConfigMap with a custom apt/yum repo list, and tell the GPU Operator to mount it. The GPU Operator documentation covers both paths. I prefer precompiled containers in regulated environments because the build chain is closed and auditable; the OS mirror approach adds a package-mirror maintenance burden that teams frequently neglect.
What Needs to Be Mirrored
Before you air-gap, the full manifest of artifacts to mirror includes: (1) every container image referenced in the GPU Operator values.yaml – operator, driver, toolkit, device-plugin, DCGM exporter, MIG manager, node-feature-discovery; (2) any NIM containers you intend to run; (3) the GPU Operator Helm chart tarball from helm.ngc.nvidia.com; (4) model weights for any NIM you plan to serve; and (5) OS package mirrors for the kernel headers the driver container needs at install time, or the precompiled driver container image for your specific kernel version. Miss any one of these and the deployment stalls at a different but equally frustrating point.
The Mirroring Workflow: ngc-cli and skopeo
Two tools do the heavy lifting: the NVIDIA NGC CLI (ngc) for listing and downloading model artifacts, and skopeo for copying container images directly between registries without a local Docker daemon. Skopeo is preferable in regulated environments because it does not require root and does not need the Docker daemon running – the copy is a direct registry-to-registry transfer.
Operational Artifact: Mirror nvcr.io to a Private Registry
Run the following on a connected bastion host. Replace registry.internal.corp:5000 with your private registry endpoint. Replace the GPU Operator version and driver version with the current release for your branch.
# 1. Authenticate to nvcr.io using your NGC API key skopeo login nvcr.io \ --username "$oauthtoken" \ --password YOUR_NGC_API_KEY # 2. Copy GPU Operator image (no local daemon needed) skopeo copy \ docker://nvcr.io/nvidia/gpu-operator:v26.3.2 \ docker://registry.internal.corp:5000/nvidia/gpu-operator:v26.3.2 # 3. Copy the precompiled driver image for Ubuntu 22.04 # (avoids the OS package bootstrap problem entirely) skopeo copy \ docker://nvcr.io/nvidia/driver:580.126.20-ubuntu22.04 \ docker://registry.internal.corp:5000/nvidia/driver:580.126.20-ubuntu22.04 # 4. Copy the DCGM exporter image skopeo copy \ docker://nvcr.io/nvidia/k8s/dcgm-exporter:4.3.0-4.9.0-ubuntu22.04 \ docker://registry.internal.corp:5000/nvidia/k8s/dcgm-exporter:4.3.0-4.9.0-ubuntu22.04 # 5. Mirror the Helm chart tgz helm pull oci://helm.ngc.nvidia.com/nvidia/charts/gpu-operator \ --version v26.3.2 \ --destination ./charts-mirror/ # Push to your internal Helm repo (ChartMuseum example) curl --data-binary "@charts-mirror/gpu-operator-v26.3.2.tgz" \ http://chartmuseum.internal.corp:8080/api/charts # 6. Download a NIM model artifact (llama-3.1-8b-instruct example) ngc registry model download-version \ nvidia/nim/llama-3.1-8b-instruct:1.8.0 \ --dest /mnt/models/
Expected result: Each skopeo copy exits 0 and prints a manifest digest. The Helm chart appears in ChartMuseum. The model directory contains config.json and weight shards.
Failure mode: If skopeo copy fails with unauthorized: authentication required, your NGC API key is expired or the $oauthtoken literal username was not used (that string is the required username for NGC token auth, not a shell variable to expand). If the driver image pull later fails inside the cluster, check whether the imagePullSecret for your private registry was created in the gpu-operator namespace and referenced in values.yaml under each component's imagePullSecrets array.
values.yaml shipped with the GPU Operator chart still points every repository field to nvcr.io/nvidia. You must override every one of those fields – operator, driver, toolkit, device-plugin, DCGM exporter, MIG manager, node-feature-discovery – with your local registry prefix, or the kubelet will still attempt to reach nvcr.io. There is no single global override; each component has its own repository key. Generate a complete override values file and commit it to source control before cutting the network.NVIDIA AI Enterprise Branch Types and Lifecycle
Picking a branch is a compliance decision as much as a software one. As of 2026, NVIDIA AI Enterprise defines four branch types for the application layer and a separate Infrastructure Branch for GPU drivers and Kubernetes operators. Each comes with specific security patch cadences that directly determine how often you must update in a regulated environment.
Branch Comparison Table
| Branch Type | Support Window | Security Patch Cadence | Release Cadence | Best For | Not Recommended For |
|---|---|---|---|---|---|
| Feature Branch (FB) | 1 month | Next monthly release | Monthly | Dev, PoC, research | Any production use |
| Production Branch (PB) | 9 months | Monthly patches | Every 6 months | Mission-critical prod, standard enterprise | Regulated industries needing 3-yr support |
| Long-Term Support (LTSB) | 3 years | Quarterly patches | Every 30 months | Healthcare, finance, government, defence | Dev environments; need for latest features |
| Infrastructure Branch | 1 yr (3 yr if LTSB Infra) | Minor every 3 months | Major every 6 months | GPU drivers, GPU Operator, Container Toolkit | AI frameworks and apps (use software branch) |
LTSB 2 is currently supported through October 2027. PB 26h1 (Production Branch released May 2026) runs through approximately February 2027. There is a naming convention change worth noting: from PB6 onward, Production Branches use sequential numbering rather than the prior date-based pattern (PB 26h1 is equivalent to PB6 in the new scheme). The NVIDIA docs still use both forms in different places, which causes confusion when cross-referencing release notes.
The Version Number Warning
NVIDIA makes this explicit in the lifecycle policy: PB and LTSB component version numbers are not always the latest upstream versions. They are the versions that can be maintained and backport-patched for the full support window. If you need the latest PyTorch or TensorRT version number, you have to use the Feature Branch, which means accepting 1-month support. Teams that try to cherry-pick the latest upstream component into an LTSB are leaving the support envelope and taking on the CVE triage burden themselves. That is a bad trade in regulated environments.
GPU Driver and CUDA CVE Tracking and Patching
NVIDIA PSIRT (Product Security Incident Response Team) publishes security bulletins on a rolling basis at nvidia.com/en-us/product-security/. Starting October 2025, bulletins are also published on GitHub in Markdown, CSAF, and CVE JSON formats, making them consumable by SIEM and vulnerability management tooling. The GPU display driver bulletins typically cover vulnerabilities in the kernel mode layer handler – kernel-level issues that can enable privilege escalation or denial of service. CUDA CVEs tend to be lower severity but occasionally affect the runtime in ways that matter for multi-tenant clusters.
For container images, NVIDIA provides a VEX (Vulnerability Exploitability eXchange) file in CycloneDX format for PB and LTSB releases. The VEX file records which known upstream CVEs in bundled open-source components are actually exploitable in the NVIDIA container image context, and which are not applicable due to build configuration. This is the artifact your security team should be pulling to close tickets rather than flagging every CVE in a generic OS layer scan.
Driver Patch Verification in an Air-Gapped Cluster
Operational Artifact: Verify Driver Version After Patch
After mirroring the updated driver image and running helm upgrade with the new driver version, verify the driver container is running the patched version:
# Check driver pod status
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset
# Exec into a driver container and check the loaded driver version
kubectl exec -n gpu-operator \
$(kubectl get pod -n gpu-operator -l app=nvidia-driver-daemonset \
-o jsonpath='{.items[0].metadata.name}') \
-- nvidia-smi --query-gpu=driver_version --format=csv,noheader
# Cross-check via node feature labels injected by GPU Feature Discovery
kubectl get node YOUR_GPU_NODE \
-o jsonpath='{.metadata.labels.nvidia\.com/driver-version}'
# Verify no CVE-affected .so is loaded (requires CVE advisory to list
# the specific library; example for a hypothetical libnvidia-ml.so issue)
kubectl exec -n gpu-operator \
$(kubectl get pod -n gpu-operator -l app=nvidia-driver-daemonset \
-o jsonpath='{.items[0].metadata.name}') \
-- strings /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 \
| grep -i 'NVRM version'
Expected result: nvidia-smi returns the patched driver version (e.g., 580.126.20). The node label nvidia.com/driver-version matches. The strings grep returns the correct version string.
Failure mode: If the driver pod is in Init:0/1 or CrashLoopBackOff after the image update, the new driver image may require a kernel module rebuild and the node has not yet completed the driver container initialization cycle. Check kubectl logs -n gpu-operator -l app=nvidia-driver-daemonset for the specific failure reason. A common cause is a kernel version mismatch if the node kernel was updated independently and the precompiled driver image has not been refreshed to match.
driver.enabled=false and manage the driver lifecycle externally. Always validate in a staging environment before applying driver updates to production GPU nodes, as driver updates require a brief GPU context teardown that interrupts running inference workloads.Patch Cadence Table for Air-Gapped Sites
The following table captures the patch cadence commitments for each layer of the NVIDIA stack in an air-gapped site, along with the actions required and typical SLA targets for regulated environments. Use this as the basis for your patching runbook.
| Stack Layer | Branch / Component | Patch Cadence | Air-Gap Action | Critical CVE SLA |
|---|---|---|---|---|
| AI Application Layer | LTSB (healthcare, gov) | Quarterly | Mirror updated containers; skopeo copy; redeploy | 72 hr [AUTHOR: confirm with your CISO] |
| AI Application Layer | Production Branch (PB) | Monthly | Mirror updated containers monthly; Helm upgrade | 72 hr for Critical; 30 days for High |
| GPU Driver / CUDA | Infrastructure Branch | Quarterly minor releases | Mirror driver container; GPU Operator helm upgrade | 72 hr for kernel-level CVEs |
| GPU Operator / Network Operator | Infrastructure Branch | Quarterly minor releases | Helm upgrade with mirrored chart + images | 30 days for High |
| Container Base Images | UBI / Ubuntu base layers | Per OS release cycle | Pull VEX file; close non-exploitable findings | Per NVIDIA PSIRT advisory |
Supported Branch vs Latest: The Decision You Cannot Defer
What NVIDIA Govemment Ready Containers Add
For US government and defence environments, NVIDIA ships Government Ready containers under the PB and LTSB tracks. These images carry STIG (Security Technical Implementation Guide) hardening for x86, FIPS 140-3 cryptographic modules, and are available on both the NGC catalog and the DoD Iron Bank repository. They are not a separate product but a delivery mode of the same AI Enterprise software – the same NIM containers, the same GPU Operator images, built to a more restrictive baseline. If your authority to operate (ATO) requires FIPS-validated crypto or DoD-approved images, this is the only path; do not try to STIG-harden a standard NGC container yourself, as that work is not covered by the AI Enterprise support contract.
Cross-Link: The VCF Lens
If you are running this air-gapped NVIDIA stack on VMware Cloud Foundation, the registry mirroring steps above apply identically at the NVIDIA-stack level, but there is additional plumbing on the VCF side: the Supervisor cluster, the Tanzu Kubernetes release images, and the vSphere with Tanzu content library all need their own air-gap mirroring before the NVIDIA components can even reach a running Kubernetes cluster. That VCF-layer mirror process is documented in the Private AI Air-Gapped Deployment post. The two mirror pipelines must be coordinated: when you update the NVIDIA GPU Operator to a new version, you also need to confirm the Supervisor TKR (Tanzu Kubernetes Release) that runs under it is still compatible. That compatibility matrix lives in the GPU Operator release notes, and it is one of the checks that belongs in your patching runbook.
My Take: A Defensible Patch Policy
A defensible patch policy for an air-gapped NVIDIA AI cluster has four non-negotiable elements. First, branch selection is documented and justified in your security plan, including the support window and what happens when that window closes. Second, you have a working mirror pipeline tested before you air-gap, with a documented runbook for refreshing it on patch day, not written after the first CVE arrives. Third, you pull the NVIDIA PSIRT bulletin feed – at minimum check nvidia.com/en-us/product-security/ monthly, and ideally wire the GitHub CSAF feed into your vulnerability management tooling. Fourth, you use the VEX file from NGC Container Scanning to close findings that are not exploitable in your specific image build, so your security team is not buried in false positives from generic OS-layer scans.
For regulated sites: pick LTSB. Quarterly patching windows are compatible with most regulated change-control processes. Nine-month PB windows can work but require a standing maintenance window every month, which is a heavier operational burden than most teams budget for. For standard enterprise air-gapped deployments: PB is the right call. Monthly security patches, predictable feature release cycle, and you are not stuck on an old component version for three years.
When NOT to use LTSB: if your workload depends on the latest NIM microservice capabilities, TensorRT-LLM quantization improvements, or the newest Nemotron model variants, LTSB will lag. Those components will be older versions than what NVIDIA is shipping on the Feature Branch. That trade-off is intentional and documented, but teams sometimes discover it mid-project when they try to deploy a NIM version that only exists on a newer FB and find it absent from the LTSB catalog.
What to validate first: before air-gapping, do a full dry-run of the mirror pipeline in a connected staging environment. Pull every image and chart, stand up the GPU Operator from the local registry, run nvidia-smi through the operator, deploy a NIM container from the local registry and run a test inference call. If that works connected, cutting the wire changes nothing. If it fails connected, you have a misconfigured registry or a missing image, and you want to find that out before the network is gone.
References
- NVIDIA AI Enterprise Lifecycle Policy: Choosing the Right Release Branch (docs.nvidia.com, updated May 2026)
- Install NVIDIA GPU Operator in Air-Gapped Environments (docs.nvidia.com, updated May 2026)
- NVIDIA Product Security / PSIRT Bulletins (nvidia.com)
- NVIDIA Data Center Driver Lifecycle (docs.nvidia.com)



