Deep Learning VMs in VMware Private AI Foundation: The Data Scientist Workbench (Private AI Series, Part 10)

What a Deep Learning VM in VMware Private AI Foundation actually is, how the image is built, the first-boot steps that quietly break deployments, and when to move off it to a VKS cluster.

by

Dr. Pranay Jha

June 15, 2026

No comments

9 minutes

Read Time

VMware Private AI Series · Part 10 of 24

A data scientist files a ticket: “I need a GPU box with PyTorch and a notebook, today.” You could hand them a raw Ubuntu VM, attach a vGPU profile, and let them spend a day fighting CUDA versions and guest driver mismatches. Or you give them a Deep Learning VM. In VMware Private AI Foundation with NVIDIA, the Deep Learning VM (DLVM) is the unit of work that turns “I want to experiment” into a running, GPU-accelerated workstation in minutes. It is also the piece of the platform people most often misunderstand, because it looks like a production serving tier and it is not.

This post explains what a DLVM actually is, how the image is put together, what happens the first time it boots (the step that quietly breaks the most deployments), and where it stops being the right tool. If you have already installed the GPU Operator and vGPU drivers from Part 9, the DLVM is the first workload you can put on top of that foundation.

What a Deep Learning VM actually is

A DLVM is a Canonical Ubuntu virtual machine that NVIDIA and VMware have pre-validated for GPU work, delivered as an image you provision from a catalog. The point is not that it runs Ubuntu. The point is that the entire stack below your model, the OS, the container runtime, the conda manifests, and the GPU driver, has already been tested as a set, so a data scientist starts building instead of debugging compatibility.

It helps to split the image into two layers. Some software is baked into the image and ships with it. The rest is pulled and installed the first time the VM powers on, driven by a cloud-init script. That second layer is where most of the operational risk lives, so it is worth seeing the split clearly.

The image is half pre-baked, half assembled at first boot. The dashed layer is where deployments fail.

You pick the workload at deploy time, and the cloud-init pulls the matching container from the NVIDIA NGC catalog. Choose PyTorch or TensorFlow and you get a ready JupyterLab instance at http://dl_vm_ip:8888. Choose NVIDIA RAG and you get a sample chatbot at http://dl_vm_ip:3001/converse that you can point at your own knowledge base. Triton and the DCGM Exporter are there too, the latter giving you Prometheus-ready vGPU metrics without any extra setup.

Where it sits in the stack

The DLVM is a guest on top of the GPU-accelerated workload domain you built earlier in this series. The host vGPU driver lives in ESXi. The DLVM carries a matching guest driver, and a vGPU profile slices the physical GPU down to what the VM needs. Understanding this placement matters because the DLVM is one of two ways to consume GPUs on the platform. The other is a VKS Kubernetes cluster with GPU worker nodes, which is the path you take toward production. Same hardware, same drivers underneath, very different operating model on top.

Same foundation, two consumption models. Pick the DLVM for people, the VKS cluster for services.

What happens on first boot

Here is the part that trips teams up. The DLVM does not arrive ready to run. The vGPU guest driver and the deep learning workload are installed the first time you start the VM. That first boot reaches out to three places: NVIDIA’s licensing service for a vGPU license, a driver source for the guest driver that matches the host, and the NGC catalog for the workload container. If any one of those is unreachable or misconfigured, the VM boots fine and reports no GPU, which sends people chasing the wrong problem.

First boot is a five-step chain. A break anywhere shows up as "no GPU" inside the guest.

In a disconnected or air-gapped site, none of that works out of the box. Your administrators have to stand up a local URL for the vGPU drivers, deploy and configure an NVIDIA DLS instance with a client configuration token, and mirror every required image and model into a local Harbor registry. Skip one and the first boot stalls. This is the single most common support call I see on new DLVM rollouts, and it is almost never a GPU fault. It is a missing token or an unreachable registry.

Three ways to deploy one

There is no single deploy button, and the right method depends on who is asking. A data scientist or DevOps engineer self-serves the AI Workstation catalog item in VCF Automation, which is the intended flow and the one worth standardizing on. A VI administrator can deploy a DLVM directly on a vSphere cluster from the vSphere Client, which is handy for a quick template test but does not give your users self-service. A DevOps engineer can deploy through the VM Service in the Supervisor with kubectl, which fits anyone already managing infrastructure as code. My advice: invest in the VCF Automation catalog path and treat the other two as escape hatches. The catalog item is where you can enforce sizing, networking, and a known-good cloud-init instead of letting people hand-roll VM properties.

DLVM or VKS cluster: when to use which

The mistake I see most is teams trying to run production inference on a Deep Learning VM because it was fast to stand up. It works in a demo and then falls over the moment you need to scale, roll a model without downtime, or share a GPU across many requests. The DLVM is a workbench. When the workload graduates from “a person is experimenting” to “a service other systems call,” it belongs on a VKS GPU cluster with NIM, which is where this series goes next.

Dimension	Deep Learning VM	VKS GPU cluster
Primary user	Data scientist, ML engineer	Platform / MLOps team
Best for	Prototyping, fine-tuning, validation, demos	Production inference and serving
Scaling model	One VM, one GPU slice, manual	Horizontal, scheduled across nodes
Interface	JupyterLab, SSH, single container	Kubernetes API, NIM endpoints
Time to value	Minutes, self-service	Longer, needs cluster lifecycle
Resilience	None built in, it is a single VM	Self-healing, rolling updates

If a human is driving it, a DLVM is right. If another system calls it, move to a cluster.

The gotchas I flag to clients

Three things are worth knowing before you roll DLVMs out at scale. First, a live one: the key used to sign the published VMware Deep Learning VM Image releases expired on January 3, 2026. If your content library enforces certificate validity through a security policy, newly published images can no longer be synced or uploaded to it. Check your content library policy before you assume a sync failure is a network problem, and watch the image release notes for a re-signed build.

Second, the vGPU guest driver is matched to the host driver automatically, which is exactly what you want, but only if the host driver is the version you think it is. If Part 9 left you on a different host driver than the image expects, the guest install can fail or fall back in ways that are annoying to diagnose. Validate the host driver version first. Third, the DLVM has no resilience on its own. It is a single VM with a single GPU slice. Do not let an experiment quietly become a dependency that the business now relies on, because there is nothing underneath it to catch a failure.

For the design context behind all of this, the architecture and components breakdown from Part 2 shows where the DLVM fits among the other moving parts, and the reference architecture and sizing guidance from Part 7 helps you decide how much GPU to hand a single workstation.

What I’d Do

Standardize on the VCF Automation AI Workstation catalog item, build a known-good cloud-init for your two or three common workloads, and validate the first-boot path end to end (driver source, DLS token, NGC or Harbor reachability) before you let a single data scientist near it. Treat the DLVM as what it is: a fast, disposable workbench for people. The moment a model needs to scale or serve, it moves off the VM and onto a cluster. Where do your data scientists hit the wall first, on GPU sizing or on the air-gapped first boot? That will tell you where to spend your hardening time.

References

VMware Private AI Series · Part 10 of 30
« Previous: Part 9 | VMware Private AI Complete Guide | Next: Part 11 »

About The Author

Dr. Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

See author's posts

Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Tags: Deep Learning VM, MLOps, NGC, PAIF, Private AI Series, vGPU, VMware Private AI

June 17, 2026

Dr. Pranay Jha