- NVIDIA AI is not one product. It is a stack roughly nine layers deep, from the GPU silicon up to agent workflows, and the pieces only pay off when they fit together.
- The layers, bottom up: silicon, system software (driver/CUDA), cluster orchestration (GPU Operator, Run:ai), optimize-and-serve (TensorRT-LLM, Triton, Dynamo), packaging (NIM), build (NeMo), models (Nemotron), and workflows (Blueprints, AI-Q).
- NVIDIA AI Enterprise is the commercial wrapper: it is what makes the open-source pieces supportable, patched and secure, not a separate product layer.
- Everything ends in a standard API. A NIM exposes an OpenAI-compatible endpoint, which is why the stack drops into existing apps without rewrites.
- My take: learn the stack as layers, adopt the one you need, and resist buying all of it on day one. This series walks each layer; Part 1 is the map.
NVIDIA AI is not a product. It is a stack, roughly nine layers deep, from the silicon up to the agent framework, and most teams meet it one piece at a time: a GPU here, a NIM container there, a NeMo fine-tune later. That piecemeal view is how you end up with tools that do not fit together and a support contract that covers half of them. The whole point of understanding the stack is knowing which layer a problem belongs to before you reach for a tool. This series maps the whole thing, layer by layer. Part 1 is the map.
The stack in one view
Read the stack bottom to top and each layer exists to make the one above it usable. Silicon is useless without drivers; drivers are useless at scale without orchestration; a trained model is useless in production until something serves it behind an API. The mistake is treating any single layer as the whole platform. A GPU is not an AI platform, and neither is a model.
Layer by layer, and what each one is for
Here is the same stack as a reference, with the job each layer does. The rest of this series takes one or two of these per part and goes deep.
| Layer | Key components | What it does |
|---|---|---|
| Silicon | Hopper, Blackwell, Rubin; NVLink/NVSwitch | Compute and the scale-up interconnect |
| System software | Driver, CUDA, cuDNN, Container Toolkit | Makes the GPU usable to software and containers |
| Orchestration | GPU Operator, Network Operator, Run:ai, Kubernetes | Provisions and schedules GPUs at scale |
| Optimize & serve | TensorRT, TensorRT-LLM, Triton, Dynamo | Compiles and serves models efficiently |
| Package | NIM inference microservices | A model behind a standard API, in a container |
| Build & customize | NeMo: Curator, Customizer, Guardrails, Retriever | Curate data, fine-tune, guard, and retrieve |
| Models | Nemotron family, open and partner models | The starting weights you build on |
| Workflows & agents | Blueprints, AI-Q, Omniverse | Reference apps and agentic workflows |
| Commercial wrapper | NVIDIA AI Enterprise | Support, security and lifecycle across all layers |
The bottom three layers are plumbing
Silicon, system software and orchestration are where infrastructure teams live. The GPU choice (Hopper for availability, Blackwell for density, Rubin on the horizon) sets your ceiling; the driver and CUDA stack set compatibility; the GPU Operator and Run:ai decide whether GPUs are shared fairly or hoarded by the first team to grab them. Most production pain that looks like an AI problem is actually a plumbing problem in these three layers: a driver mismatch, an unscheduled GPU, a saturated fabric.
The middle layers are where models become services
Optimize-and-serve and packaging are the layers people underestimate. A model checkpoint is not a service. TensorRT-LLM compiles it for the target GPU, Triton or Dynamo serves it, and a NIM wraps the whole thing in a container with a standard API and sensible defaults. The difference between a research demo and something on call at 2am is almost entirely in these layers.
How the pieces actually connect
The layers are not just a stack, they are a pipeline. You start from a model, optionally customize it with NeMo, compile and serve it, package it as a NIM, and consume it from an app or agent. The payoff of the whole chain is the last step: a standard endpoint your application already knows how to call.
That last box matters more than it looks. A NIM exposes an OpenAI-compatible API, so the application calling it does not need to know anything about GPUs, TensorRT or the model format underneath.
# A NIM is the whole stack behind one standard endpoint
curl -s http://nim.local:8000/v1/chat/completions
-H "Authorization: Bearer $NGC_API_KEY"
-H "Content-Type: application/json"
-d '{"model":"nvidia/llama-3.1-nemotron-70b-instruct","messages":[{"role":"user","content":"hello"}]}'
Expected result: a standard chat-completions JSON response, identical in shape to what your app would get from a public LLM API, served entirely on your own GPUs. That compatibility is the quiet reason the NVIDIA stack adopts well: the top of the stack looks like the API your developers already use.
NVIDIA AI Enterprise: the wrapper, not a layer
Most of the stack is available as open source. NVIDIA AI Enterprise is the commercial subscription that wraps it with enterprise support, security patching, validated versions and a defined lifecycle, plus orchestration (Run:ai) and the packaged NIM and NeMo microservices. Think of it as the difference between pulling a community container and running a vendor-supported, CVE-patched build with someone to call. Part 2 of this series digs into exactly what the subscription includes and when it is worth it; for now, place it correctly in your mental model: it spans every layer rather than sitting on top as one more box.
Where to enter the stack
You do not start at the bottom and climb. You start from what you are trying to do and pull in only the layers that goal touches. Four common entry points cover most teams.
How this series is organized
The 30 parts follow the stack: foundations and the GPU lineup, then GPU infrastructure (partitioning, NVLink, InfiniBand vs Spectrum-X, storage, power), the software platform (drivers, operators, NGC), inference (NIM, TensorRT-LLM, Triton, Dynamo, economics), customization and training (NeMo), models and agents (Nemotron, RAG, Blueprints), and finally operations and the verdict. Where a topic meets VMware, this series links to the VMware Private AI series rather than repeating it: this series is the NVIDIA stack itself; that one is how to run it on VCF.
Two journeys: run a model, or build one
Almost every team on this stack is on one of two journeys, and confusing them is the most common planning error I see. The run journey is short: take an existing model, serve it through a NIM, and point an app or agent at the endpoint. It touches the bottom layers plus packaging, it is mostly an infrastructure and operations exercise, and most enterprises should start here because it delivers a working capability in weeks without a data-science team.
The build journey is longer and pulls in the NeMo layer: curate data, fine-tune or align a model, evaluate it, then serve it. It is a data and ML engineering exercise as much as an infrastructure one, and it pays off only when an off-the-shelf model genuinely cannot do the job, which is rarer than most teams assume. The honest test before committing to the build path: have you proven that a well-prompted, retrieval-grounded stock model fails the task? If you have not, you are on the run journey, and that is good news for your timeline and budget. Knowing which journey you are on tells you which half of this series to read first.
The Bottom Line
Treat NVIDIA AI as a stack, learn it as layers, and adopt only the layers your goal touches. The reason is cost and focus: the stack is deep and expensive, and the teams that struggle are the ones that bought the whole thing before they knew which layer their problem lived in. My recommendation for anyone starting: pin down the goal, map it to layers using the table above, stand up the bottom three (silicon, system software, orchestration) properly because everything rests on them, and enter the upper stack at exactly one point. When would I go broader on day one? Only for a dedicated AI platform team chartered to serve many use cases, where breadth is the job. For everyone else, narrow and deep beats wide and shallow. Next up, Part 2: what NVIDIA AI Enterprise actually includes and whether the subscription is worth it. Which layer does your current project actually live in?
References
- NVIDIA AI Enterprise (NVIDIA)
- NVIDIA AI Enterprise documentation (NVIDIA Docs)
- NVIDIA NIM for developers (NVIDIA Developer)
- NVIDIA NeMo (NVIDIA)



