Guardrails and Responsible AI on VMware Private AI: What NeMo Guardrails Actually Stops (Private AI Series, Part 28)

Private does not mean safe. Here is how NeMo Guardrails wraps your models on VMware Private AI, the five rail types, and an honest line on what guardrails catch and what they do not.

by

Dr. Pranay Jha

June 17, 2026

No comments

4 minutes

Read Time

VMware Private AI Series · Part 28 of 30

There is a comfortable assumption baked into the phrase Private AI: that because the model runs on your own GPUs behind your own firewall, it is safe. Private and safe are different properties. A model running entirely on-premises will still cheerfully follow a jailbreak prompt, leak the contents of a document it retrieved, or answer a question about a topic you never wanted it discussing with customers. Privacy is about where the data goes. Safety is about what the model says. This post is about the second one, and the service that handles it on Private AI: NeMo Guardrails.

On-premises protects your data location. It does nothing for model behavior. Those need different controls.

Where guardrails sit

NeMo Guardrails is not part of the model. It is a microservice that sits in front of it, inspecting and potentially blocking traffic in both directions. On Private AI it deploys as a custom resource through the NIM Operator, the same way your serving does, and version 25.10 added OpenTelemetry tracing so you can actually see which rail fired and why. That observability matters more than it sounds: a guardrail that silently blocks legitimate queries is worse than no guardrail, because nobody knows why the assistant suddenly went useless.

Guardrails inspect both directions. The model never sees a blocked prompt; the user never sees a blocked answer.

The five rail types

Guardrails are organized as rails, and there are five kinds. Most teams switch on two and ignore the rest, which is fine if it is a deliberate choice and not ignorance of what the other three do.

Rail	Fires	Catches
Input	Before the model sees the prompt	Jailbreaks, prompt injection, off-topic requests
Dialog	During the conversation flow	Steering the conversation, enforcing scripted paths
Retrieval	On retrieved RAG chunks	Sensitive documents the user should not see
Execution	When the model calls a tool	Unsafe actions, unauthorized tool use
Output	Before the answer reaches the user	Toxicity, PII leakage, hallucinated claims

Rails fire in sequence. A block at any stage short-circuits to a safe canned response.

The two rails worth turning on for almost any internal deployment are input and output. Input rails with a content-safety and jailbreak-detection NIM stop the obvious attacks. Output rails catch the model leaking PII or producing toxic text. Retrieval rails matter the moment you have a RAG system over documents with mixed sensitivity, because that is exactly where a model will helpfully summarize a file the asker had no right to read. If you took my advice in the fine-tuning post and kept knowledge in retrieval rather than weights, the retrieval rail is your access-control enforcement point.

Disclaimer: guardrails reduce risk, they do not eliminate it. Test rails against an adversarial prompt set before relying on them, monitor for false positives that frustrate real users, version your rail configs alongside the model, and never treat a content-safety rail as a substitute for proper access control on the underlying data.

My take

Guardrails are necessary and oversold in the same breath. They are necessary because an unguarded internal assistant is a genuine liability, the prompt-injection and data-leak risks are real even behind your firewall. They are oversold because vendors imply they make a model safe, and they do not. They make it safer. A determined adversary with enough attempts will still find gaps, which is why guardrails belong in a layered design with access control, audit logging, and human review for high-stakes outputs, not as a single magic filter. Turn on input and output rails as a baseline for every production assistant, add retrieval rails the moment sensitive documents enter the picture, and instrument the whole thing with the OpenTelemetry tracing so you can prove what fired. Treat a guardrail config like firewall policy: reviewed, versioned, and never edited live without a test.

Which rail would have saved you the most trouble so far, input or retrieval? In my experience it is almost always retrieval that gets overlooked.

References

VMware Private AI Series · Part 28 of 30
« Previous: Part 27 | VMware Private AI Complete Guide | Next: Part 29 »

About The Author

Dr. Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

See author's posts