Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

Tag: NVIDIA AI Series

AI Stack, AI/ML

Deploying and Autoscaling NIM in Production on Kubernetes (NVIDIA AI Series, Part 17)

Dr. Pranay Jha

June 22, 2026

How to deploy NVIDIA NIM in production using the NIM Operator and Helm, wire autoscaling on the right GPU and KV-cache signals instead of CPU, handle cold-start model load, and run blue-green rollouts without dropping throughput.
Continue Reading
AI Stack, AI/ML

TensorRT and TensorRT-LLM: Optimization, Quantization, and Engine Building (NVIDIA AI Series, Part 18)

Dr. Pranay Jha

June 22, 2026

What TensorRT does at build time versus what TensorRT-LLM adds at runtime — kernel fusion, paged KV cache, in-flight batching, and quantization choices from FP8 to NVFP4 — and when to hand-build engines instead of relying on a NIM.
Continue Reading
AI Stack, AI/ML

NVIDIA NIM Inference Microservices: What a NIM Is and How It Serves a Model (NVIDIA AI Series, Part 16)

Dr. Pranay Jha

June 22, 2026

NVIDIA NIM packages a model, an optimized inference engine, and an OpenAI-compatible API into a single container. Pull it, pass your NGC API key, and you have a production inference endpoint on your own GPU infrastructure in minutes.
Continue Reading
AI Stack, AI/ML

NVIDIA Network Operator on Kubernetes: RDMA, SR-IOV, and the Accelerated Fabric (NVIDIA AI Series, Part 13)

Dr. Pranay Jha

June 22, 2026

The NVIDIA Network Operator provisions MOFED drivers, RDMA shared device plugin, SR-IOV VFs, and Multus secondary networks to Kubernetes pods. This is how GPUDirect RDMA actually works at scale on ConnectX-7 and NDR InfiniBand clusters.
Continue Reading
AI Stack, AI/ML

NVIDIA Drivers, CUDA, and the Container Toolkit: Building a Clean GPU Host Baseline (NVIDIA AI Series, Part 11)

Dr. Pranay Jha

June 22, 2026

The GPU host stack has three distinct layers: the data-center driver (open kernel module now required for Hopper and Blackwell), the CUDA Toolkit, and the NVIDIA Container Toolkit. Get the install order or versions wrong and containers fail silently. Here is the right sequence, the compatibility matrix, and the failure modes.
Continue Reading
AI Stack, AI/ML

Air-Gapped Deployment, Lifecycle and CVE Patching for the NVIDIA Stack (NVIDIA AI Series, Part 15)

Dr. Pranay Jha

June 22, 2026

Running NVIDIA AI Enterprise in an air-gapped environment requires mirroring nvcr.io containers, Helm charts, and model weights before you cut the wire. Here is the branch selection, driver patch cadence, and CVE triage workflow that keeps regulated deployments defensible.
Continue Reading
AI Stack, AI/ML

NGC Catalog: Containers, Models, Helm Charts and How to Consume Them (NVIDIA AI Series, Part 14)

Dr. Pranay Jha

June 22, 2026

The NGC catalog is your upstream source for NVIDIA GPU-optimized containers, pretrained models, and Helm charts. Here is how the nvcr.io registry, org/team/API-key model, and NVAIE entitlement actually work, with a full operational pull-and-deploy walkthrough.
Continue Reading
AI Stack, AI/ML

NVIDIA GPU Operator on Kubernetes: ClusterPolicy, Components, and Day-2 Ops (NVIDIA AI Series, Part 12)

Dr. Pranay Jha

June 22, 2026

The NVIDIA GPU Operator automates every software layer a GPU node needs in Kubernetes, from kernel driver to DCGM metrics, via a single ClusterPolicy CRD. Here is what it installs, how the reconciliation loop works, when to disable the driver component, and the failure modes that will catch you on first install.
Continue Reading
AI Stack, AI/ML

InfiniBand vs Spectrum-X Ethernet: Choosing Your AI Cluster Scale-Out Fabric (NVIDIA AI Series, Part 8)

Dr. Pranay Jha

June 22, 2026

InfiniBand Quantum-X800 and Spectrum-X Ethernet both run at 800 Gb/s — but they are not the same choice. A direct comparison of SHARPv4 in-network reduction, lossless fabric mechanisms, rail-optimized topology, multi-tenant isolation, and operational trade-offs, with a clear verdict on which fabric wins for dedicated AI training versus shared enterprise GPU platforms.
Continue Reading
AI Stack, AI/ML

GPUDirect Storage: The DMA Path from NVMe to GPU Memory (NVIDIA AI Series, Part 9)

Dr. Pranay Jha

June 22, 2026

GPUDirect Storage (GDS) creates a direct DMA path from NVMe or networked storage straight into GPU HBM, bypassing the CPU bounce buffer entirely. Here is when it helps, what the cuFile API requires, and the filesystem and NIC prerequisites to validate before enabling in production.
Continue Reading
AI Stack, AI/ML

GPU Power, Cooling and Density: Why Blackwell Forces Liquid (NVIDIA AI Series, Part 10)

Dr. Pranay Jha

June 22, 2026

The GB200 NVL72 draws ~120 kW per rack and ships liquid-cooled by design. Learn why Blackwell-class systems make direct-to-chip cooling mandatory, how CDUs and facility water loops work, and what to validate before ordering.
Continue Reading
AI Stack, AI/ML

NVLink and NVSwitch: How NVIDIA Builds the Scale-Up Fabric (NVIDIA AI Series, Part 7)

Dr. Pranay Jha

June 22, 2026

Fifth-generation NVLink delivers 1.8 TB/s per GPU, and NVSwitch builds a non-blocking 130 TB/s all-to-all fabric across 72 GPUs in the GB200 NVL72. Here is how the domain forms, why it determines your tensor and expert parallelism strategy, and where the boundary falls.
Continue Reading

Architect’s Toolkit

About the Author

Dr Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

You May Have Missed

AI Stack, AI/ML, VMware & Cloud

Running NVIDIA AI On-Prem and on VCF: Cost, Trade-offs and the Verdict (NVIDIA AI Series, Part 30)

June 23, 2026
AI Stack, AI/ML

GPU Observability and Multi-Tenancy: DCGM, Honest Utilization, and Sharing (NVIDIA AI Series, Part 29)

June 23, 2026
AI Stack, AI/ML

NVIDIA Blueprints and Agentic AI: AI-Q and the NeMo Agent Toolkit (NVIDIA AI Series, Part 28)

June 23, 2026
AI Stack, AI/ML

The NVIDIA NeMo Framework: Training and Fine-Tuning at Scale (NVIDIA AI Series, Part 22)

June 23, 2026
AI Stack, AI/ML

NVIDIA NeMo Retriever: RAG with Embeddings, Reranking and Guardrails (NVIDIA AI Series, Part 27)

June 23, 2026