Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

Air-Gapped Deployment, Lifecycle and CVE Patching for the NVIDIA Stack (NVIDIA AI Series, Part 15)

Running NVIDIA AI Enterprise in an air-gapped environment requires mirroring nvcr.io containers, Helm charts, and model weights before you cut the wire. Here is the branch selection, driver patch cadence, and CVE triage workflow that keeps regulated deployments defensible.

NVIDIA AI Series · Part 15 of 30
TL;DR: Air-gapping the NVIDIA stack is not a single switch you flip. You must mirror nvcr.io containers AND NGC Helm charts AND model weights into a private registry before you cut the wire, deal with the driver container’s OS package bootstrap problem, pick a branch type that your patch cadence can actually honour, and build a repeatable CVE triage workflow tied to the NVIDIA PSIRT bulletin cadence. LTSB buys you quarterly patching windows and 3-year API stability. Production Branch (PB) gives you monthly security patches on a 6-month feature cycle and a 9-month support window. Feature Branch is not for production. Pick the wrong branch and you will be scrambling to re-certify or stranded on an unpatched image.
Who this is for: Platform engineers and AI infrastructure architects running NVIDIA AI Enterprise in classified, healthcare, financial, or other network-restricted environments. You should be familiar with the NGC catalog from Part 14 and the GPU Operator from Part 12. If you are deploying on VCF, read the VCF-specific air-gap walk-through at Private AI: Air-Gapped Deployment alongside this post. That post covers the VMware and vSphere Tanzu layer; this one covers the NVIDIA stack you run on top of it.

The first thing that goes wrong when you cut internet access to an NVIDIA AI cluster is not the model inference. It is the GPU Operator driver container, quietly failing to download kernel headers from an Ubuntu mirror that no longer exists on the network. The second thing that goes wrong is the Helm chart referencing nvcr.io/nvidia/gpu-operator:v26.3.2 and the kubelet timing out on an image pull. Both failures are entirely preventable, but only if you mirror the right artifacts before you air-gap, and only if you know exactly which images the Operator expects and which OS packages the driver container needs at runtime.

The Bootstrap Problem in Air-Gapped Environments

Every NVIDIA container image pulls from nvcr.io. Every GPU Operator driver container also downloads OS packages (kernel headers, GCC) from the distribution mirror at install time. That second dependency is the one operators miss. You can mirror every container image faithfully and still have the driver installation fail because the driver init container tries to reach archive.ubuntu.com or the Red Hat CDN and gets connection-refused.

There are two ways to solve it. The cleaner one: use precompiled driver containers, which NVIDIA ships for supported kernel versions and which contain all compiled artifacts baked in. No runtime package download. The more flexible one: mirror the OS package repository, create a ConfigMap with a custom apt/yum repo list, and tell the GPU Operator to mount it. The GPU Operator documentation covers both paths. I prefer precompiled containers in regulated environments because the build chain is closed and auditable; the OS mirror approach adds a package-mirror maintenance burden that teams frequently neglect.

Air-Gap Mirror Pipeline Bastion syncs before the wire is cut; cluster never touches the internet INTERNET nvcr.io containers / Helm charts NGC model files weights / ONNX / TRT OS package mirror archive.ubuntu.com etc. BASTION HOST ngc-cli / skopeo apt-mirror / reposync helm pull + push connected zone only PRIVATE ZONE Private registry Harbor / ECR / Artifactory Local Helm repo chart server (ChartMuseum) Air-gapped K8s cluster GPU Operator values.yaml pull push air-gap boundary
Figure 1: The bastion host is the only node that ever touches the internet. After mirroring, the wire is cut and the cluster pulls from internal registries only.

What Needs to Be Mirrored

Before you air-gap, the full manifest of artifacts to mirror includes: (1) every container image referenced in the GPU Operator values.yaml – operator, driver, toolkit, device-plugin, DCGM exporter, MIG manager, node-feature-discovery; (2) any NIM containers you intend to run; (3) the GPU Operator Helm chart tarball from helm.ngc.nvidia.com; (4) model weights for any NIM you plan to serve; and (5) OS package mirrors for the kernel headers the driver container needs at install time, or the precompiled driver container image for your specific kernel version. Miss any one of these and the deployment stalls at a different but equally frustrating point.

The Mirroring Workflow: ngc-cli and skopeo

Two tools do the heavy lifting: the NVIDIA NGC CLI (ngc) for listing and downloading model artifacts, and skopeo for copying container images directly between registries without a local Docker daemon. Skopeo is preferable in regulated environments because it does not require root and does not need the Docker daemon running – the copy is a direct registry-to-registry transfer.

Operational Artifact: Mirror nvcr.io to a Private Registry

Run the following on a connected bastion host. Replace registry.internal.corp:5000 with your private registry endpoint. Replace the GPU Operator version and driver version with the current release for your branch.

# 1. Authenticate to nvcr.io using your NGC API key
skopeo login nvcr.io \
  --username "$oauthtoken" \
  --password YOUR_NGC_API_KEY

# 2. Copy GPU Operator image (no local daemon needed)
skopeo copy \
  docker://nvcr.io/nvidia/gpu-operator:v26.3.2 \
  docker://registry.internal.corp:5000/nvidia/gpu-operator:v26.3.2

# 3. Copy the precompiled driver image for Ubuntu 22.04
#    (avoids the OS package bootstrap problem entirely)
skopeo copy \
  docker://nvcr.io/nvidia/driver:580.126.20-ubuntu22.04 \
  docker://registry.internal.corp:5000/nvidia/driver:580.126.20-ubuntu22.04

# 4. Copy the DCGM exporter image
skopeo copy \
  docker://nvcr.io/nvidia/k8s/dcgm-exporter:4.3.0-4.9.0-ubuntu22.04 \
  docker://registry.internal.corp:5000/nvidia/k8s/dcgm-exporter:4.3.0-4.9.0-ubuntu22.04

# 5. Mirror the Helm chart tgz
helm pull oci://helm.ngc.nvidia.com/nvidia/charts/gpu-operator \
  --version v26.3.2 \
  --destination ./charts-mirror/

# Push to your internal Helm repo (ChartMuseum example)
curl --data-binary "@charts-mirror/gpu-operator-v26.3.2.tgz" \
  http://chartmuseum.internal.corp:8080/api/charts

# 6. Download a NIM model artifact (llama-3.1-8b-instruct example)
ngc registry model download-version \
  nvidia/nim/llama-3.1-8b-instruct:1.8.0 \
  --dest /mnt/models/

Expected result: Each skopeo copy exits 0 and prints a manifest digest. The Helm chart appears in ChartMuseum. The model directory contains config.json and weight shards.

Failure mode: If skopeo copy fails with unauthorized: authentication required, your NGC API key is expired or the $oauthtoken literal username was not used (that string is the required username for NGC token auth, not a shell variable to expand). If the driver image pull later fails inside the cluster, check whether the imagePullSecret for your private registry was created in the gpu-operator namespace and referenced in values.yaml under each component's imagePullSecrets array.

Gotcha: The values.yaml shipped with the GPU Operator chart still points every repository field to nvcr.io/nvidia. You must override every one of those fields – operator, driver, toolkit, device-plugin, DCGM exporter, MIG manager, node-feature-discovery – with your local registry prefix, or the kubelet will still attempt to reach nvcr.io. There is no single global override; each component has its own repository key. Generate a complete override values file and commit it to source control before cutting the network.

NVIDIA AI Enterprise Branch Types and Lifecycle

Picking a branch is a compliance decision as much as a software one. As of 2026, NVIDIA AI Enterprise defines four branch types for the application layer and a separate Infrastructure Branch for GPU drivers and Kubernetes operators. Each comes with specific security patch cadences that directly determine how often you must update in a regulated environment.

Branch Lifecycle and Support Windows Timeline (not to scale) — NVIDIA AI Enterprise 2026 Time (relative from release) 1 mo FB 9 months — monthly security patches PB 3 years — quarterly security patches for high + critical CVEs LTSB 1 year (3 yr if LTSB Infra) — minor every 3 mo Infra monthly patches quarterly patches
Figure 2: Branch lifecycle windows from first release. FB is development-only; PB suits most production workloads; LTSB is the regulated-industry choice. Infrastructure Branch covers drivers and Kubernetes operators.

Branch Comparison Table

Branch Type Support Window Security Patch Cadence Release Cadence Best For Not Recommended For
Feature Branch (FB) 1 month Next monthly release Monthly Dev, PoC, research Any production use
Production Branch (PB) 9 months Monthly patches Every 6 months Mission-critical prod, standard enterprise Regulated industries needing 3-yr support
Long-Term Support (LTSB) 3 years Quarterly patches Every 30 months Healthcare, finance, government, defence Dev environments; need for latest features
Infrastructure Branch 1 yr (3 yr if LTSB Infra) Minor every 3 months Major every 6 months GPU drivers, GPU Operator, Container Toolkit AI frameworks and apps (use software branch)

LTSB 2 is currently supported through October 2027. PB 26h1 (Production Branch released May 2026) runs through approximately February 2027. There is a naming convention change worth noting: from PB6 onward, Production Branches use sequential numbering rather than the prior date-based pattern (PB 26h1 is equivalent to PB6 in the new scheme). The NVIDIA docs still use both forms in different places, which causes confusion when cross-referencing release notes.

The Version Number Warning

NVIDIA makes this explicit in the lifecycle policy: PB and LTSB component version numbers are not always the latest upstream versions. They are the versions that can be maintained and backport-patched for the full support window. If you need the latest PyTorch or TensorRT version number, you have to use the Feature Branch, which means accepting 1-month support. Teams that try to cherry-pick the latest upstream component into an LTSB are leaving the support envelope and taking on the CVE triage burden themselves. That is a bad trade in regulated environments.

GPU Driver and CUDA CVE Tracking and Patching

NVIDIA PSIRT (Product Security Incident Response Team) publishes security bulletins on a rolling basis at nvidia.com/en-us/product-security/. Starting October 2025, bulletins are also published on GitHub in Markdown, CSAF, and CVE JSON formats, making them consumable by SIEM and vulnerability management tooling. The GPU display driver bulletins typically cover vulnerabilities in the kernel mode layer handler – kernel-level issues that can enable privilege escalation or denial of service. CUDA CVEs tend to be lower severity but occasionally affect the runtime in ways that matter for multi-tenant clusters.

For container images, NVIDIA provides a VEX (Vulnerability Exploitability eXchange) file in CycloneDX format for PB and LTSB releases. The VEX file records which known upstream CVEs in bundled open-source components are actually exploitable in the NVIDIA container image context, and which are not applicable due to build configuration. This is the artifact your security team should be pulling to close tickets rather than flagging every CVE in a generic OS layer scan.

CVE Triage Flow for Air-Gapped NVIDIA Stack From PSIRT bulletin to patch decision NVIDIA PSIRT Bulletin nvidia.com/security + GitHub Severity Assessment Critical / High / Medium / Low (CVSS) Check VEX File Is CVE exploitable in this image? not applicable Close ticket record VEX ref exploitable SLA clock starts Critical: 72 hr Pull Patched Image into Mirror skopeo copy + re-run Helm upgrade in air-gap
Figure 3: CVE triage flow. The VEX file is the key artifact that prevents closing valid tickets and avoids false-positive escalations from container scanning tools.

Driver Patch Verification in an Air-Gapped Cluster

Operational Artifact: Verify Driver Version After Patch

After mirroring the updated driver image and running helm upgrade with the new driver version, verify the driver container is running the patched version:

# Check driver pod status
kubectl get pods -n gpu-operator -l app=nvidia-driver-daemonset

# Exec into a driver container and check the loaded driver version
kubectl exec -n gpu-operator \
  $(kubectl get pod -n gpu-operator -l app=nvidia-driver-daemonset \
    -o jsonpath='{.items[0].metadata.name}') \
  -- nvidia-smi --query-gpu=driver_version --format=csv,noheader

# Cross-check via node feature labels injected by GPU Feature Discovery
kubectl get node YOUR_GPU_NODE \
  -o jsonpath='{.metadata.labels.nvidia\.com/driver-version}'

# Verify no CVE-affected .so is loaded (requires CVE advisory to list
# the specific library; example for a hypothetical libnvidia-ml.so issue)
kubectl exec -n gpu-operator \
  $(kubectl get pod -n gpu-operator -l app=nvidia-driver-daemonset \
    -o jsonpath='{.items[0].metadata.name}') \
  -- strings /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 \
     | grep -i 'NVRM version'

Expected result: nvidia-smi returns the patched driver version (e.g., 580.126.20). The node label nvidia.com/driver-version matches. The strings grep returns the correct version string.

Failure mode: If the driver pod is in Init:0/1 or CrashLoopBackOff after the image update, the new driver image may require a kernel module rebuild and the node has not yet completed the driver container initialization cycle. Check kubectl logs -n gpu-operator -l app=nvidia-driver-daemonset for the specific failure reason. A common cause is a kernel version mismatch if the node kernel was updated independently and the precompiled driver image has not been refreshed to match.

Disclaimer: The patch procedure above targets the GPU Operator driver container model on Kubernetes. If you are running bare-metal drivers installed via the NVIDIA data center driver package (runfile or DKMS), the patch path is a package update via your local mirror repository, not a container image update. Never mix the two installation methods on the same node; the GPU Operator will conflict with a pre-installed bare-metal driver unless you set driver.enabled=false and manage the driver lifecycle externally. Always validate in a staging environment before applying driver updates to production GPU nodes, as driver updates require a brief GPU context teardown that interrupts running inference workloads.

Patch Cadence Table for Air-Gapped Sites

The following table captures the patch cadence commitments for each layer of the NVIDIA stack in an air-gapped site, along with the actions required and typical SLA targets for regulated environments. Use this as the basis for your patching runbook.

Stack Layer Branch / Component Patch Cadence Air-Gap Action Critical CVE SLA
AI Application Layer LTSB (healthcare, gov) Quarterly Mirror updated containers; skopeo copy; redeploy 72 hr [AUTHOR: confirm with your CISO]
AI Application Layer Production Branch (PB) Monthly Mirror updated containers monthly; Helm upgrade 72 hr for Critical; 30 days for High
GPU Driver / CUDA Infrastructure Branch Quarterly minor releases Mirror driver container; GPU Operator helm upgrade 72 hr for kernel-level CVEs
GPU Operator / Network Operator Infrastructure Branch Quarterly minor releases Helm upgrade with mirrored chart + images 30 days for High
Container Base Images UBI / Ubuntu base layers Per OS release cycle Pull VEX file; close non-exploitable findings Per NVIDIA PSIRT advisory

Supported Branch vs Latest: The Decision You Cannot Defer

Supported Branch vs Latest: Decision Tree Need absolute latest features or version numbers? Yes Feature Branch 1-month support only Dev / PoC only No Highly regulated industry or multi-year certification cycle? Yes LTSB 3-year window, quarterly patches Gov Ready containers available No Production Branch (PB) 9-month window, monthly patches Standard enterprise production
Figure 4: Branch selection decision tree. The choice locks in your patching SLA, so make it before you air-gap, not after your first CVE bulletin arrives.
In-Practice: I have seen teams air-gap a cluster on a Feature Branch because it had the NIM version they needed for a proof-of-concept. Three months later, a critical GPU driver CVE arrived, the FB was already EOL, and the patch did not exist for that branch. The only path out was a full redeployment onto a supported branch, which meant re-mirroring the entire registry and a maintenance window. The lesson: treat branch selection as a deployment contract with a real support window. If your change control process cannot turn a patch around in 9 months, do not pick PB. And do not start on FB and plan to migrate later; plan the migration before you cut the wire.

What NVIDIA Govemment Ready Containers Add

For US government and defence environments, NVIDIA ships Government Ready containers under the PB and LTSB tracks. These images carry STIG (Security Technical Implementation Guide) hardening for x86, FIPS 140-3 cryptographic modules, and are available on both the NGC catalog and the DoD Iron Bank repository. They are not a separate product but a delivery mode of the same AI Enterprise software – the same NIM containers, the same GPU Operator images, built to a more restrictive baseline. If your authority to operate (ATO) requires FIPS-validated crypto or DoD-approved images, this is the only path; do not try to STIG-harden a standard NGC container yourself, as that work is not covered by the AI Enterprise support contract.

Cross-Link: The VCF Lens

If you are running this air-gapped NVIDIA stack on VMware Cloud Foundation, the registry mirroring steps above apply identically at the NVIDIA-stack level, but there is additional plumbing on the VCF side: the Supervisor cluster, the Tanzu Kubernetes release images, and the vSphere with Tanzu content library all need their own air-gap mirroring before the NVIDIA components can even reach a running Kubernetes cluster. That VCF-layer mirror process is documented in the Private AI Air-Gapped Deployment post. The two mirror pipelines must be coordinated: when you update the NVIDIA GPU Operator to a new version, you also need to confirm the Supervisor TKR (Tanzu Kubernetes Release) that runs under it is still compatible. That compatibility matrix lives in the GPU Operator release notes, and it is one of the checks that belongs in your patching runbook.

My Take: A Defensible Patch Policy

A defensible patch policy for an air-gapped NVIDIA AI cluster has four non-negotiable elements. First, branch selection is documented and justified in your security plan, including the support window and what happens when that window closes. Second, you have a working mirror pipeline tested before you air-gap, with a documented runbook for refreshing it on patch day, not written after the first CVE arrives. Third, you pull the NVIDIA PSIRT bulletin feed – at minimum check nvidia.com/en-us/product-security/ monthly, and ideally wire the GitHub CSAF feed into your vulnerability management tooling. Fourth, you use the VEX file from NGC Container Scanning to close findings that are not exploitable in your specific image build, so your security team is not buried in false positives from generic OS-layer scans.

For regulated sites: pick LTSB. Quarterly patching windows are compatible with most regulated change-control processes. Nine-month PB windows can work but require a standing maintenance window every month, which is a heavier operational burden than most teams budget for. For standard enterprise air-gapped deployments: PB is the right call. Monthly security patches, predictable feature release cycle, and you are not stuck on an old component version for three years.

When NOT to use LTSB: if your workload depends on the latest NIM microservice capabilities, TensorRT-LLM quantization improvements, or the newest Nemotron model variants, LTSB will lag. Those components will be older versions than what NVIDIA is shipping on the Feature Branch. That trade-off is intentional and documented, but teams sometimes discover it mid-project when they try to deploy a NIM version that only exists on a newer FB and find it absent from the LTSB catalog.

What to validate first: before air-gapping, do a full dry-run of the mirror pipeline in a connected staging environment. Pull every image and chart, stand up the GPU Operator from the local registry, run nvidia-smi through the operator, deploy a NIM container from the local registry and run a test inference call. If that works connected, cutting the wire changes nothing. If it fails connected, you have a misconfigured registry or a missing image, and you want to find that out before the network is gone.

The Bottom Line: Air-gapping the NVIDIA stack is an operational decision that must be made before deployment, not retrofitted. Mirror everything – containers, Helm charts, model weights, OS packages – before cutting the wire. Use precompiled driver containers to eliminate the OS package bootstrap problem. Match your branch to your patch cadence tolerance: LTSB for quarterly windows in regulated environments, PB for monthly patching in standard enterprise. Wire the NVIDIA PSIRT feed into your vulnerability management workflow and use the VEX file to close non-exploitable findings. The teams that get this right build the mirror pipeline as a repeatable automation, not a one-time manual process. Got a question about your specific mirror setup? Leave a comment below.
NVIDIA AI Series · Part 15 of 30
« Previous: Part 14  |  NVIDIA AI Guide  |  Next: Part 16 »

References

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading