Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

Guardrails and Responsible AI on VMware Private AI: What NeMo Guardrails Actually Stops (Private AI Series, Part 28)

Private does not mean safe. Here is how NeMo Guardrails wraps your models on VMware Private AI, the five rail types, and an honest line on what guardrails catch and what they do not.

VMware Private AI Series · Part 28 of 30

There is a comfortable assumption baked into the phrase Private AI: that because the model runs on your own GPUs behind your own firewall, it is safe. Private and safe are different properties. A model running entirely on-premises will still cheerfully follow a jailbreak prompt, leak the contents of a document it retrieved, or answer a question about a topic you never wanted it discussing with customers. Privacy is about where the data goes. Safety is about what the model says. This post is about the second one, and the service that handles it on Private AI: NeMo Guardrails.

Private is not the same as safe The myth It runs on our GPUs, so it is secure. No data leaves, nothing to worry about. Guardrails are for public chatbots. The reality The model still obeys jailbreaks. It can leak retrieved documents. Internal users do prompt-inject too. Safety is a separate control layer.
On-premises protects your data location. It does nothing for model behavior. Those need different controls.

Where guardrails sit

NeMo Guardrails is not part of the model. It is a microservice that sits in front of it, inspecting and potentially blocking traffic in both directions. On Private AI it deploys as a custom resource through the NIM Operator, the same way your serving does, and version 25.10 added OpenTelemetry tracing so you can actually see which rail fired and why. That observability matters more than it sounds: a guardrail that silently blocks legitimate queries is worse than no guardrail, because nobody knows why the assistant suddenly went useless.

The guardrail is a checkpoint, both ways User NeMo Guardrails input rails on the way in, output rails on the way back LLM NIM prompt cleared prompt raw answer checked answer Nothing reaches the model, or the user, without passing the relevant rail.
Guardrails inspect both directions. The model never sees a blocked prompt; the user never sees a blocked answer.

The five rail types

Guardrails are organized as rails, and there are five kinds. Most teams switch on two and ignore the rest, which is fine if it is a deliberate choice and not ignorance of what the other three do.

RailFiresCatches
InputBefore the model sees the promptJailbreaks, prompt injection, off-topic requests
DialogDuring the conversation flowSteering the conversation, enforcing scripted paths
RetrievalOn retrieved RAG chunksSensitive documents the user should not see
ExecutionWhen the model calls a toolUnsafe actions, unauthorized tool use
OutputBefore the answer reaches the userToxicity, PII leakage, hallucinated claims
One request, rails in order Input rail Retrieval rail Execution rail Output rail User Any rail can stop the request and return a safe fallback instead of continuing.
Rails fire in sequence. A block at any stage short-circuits to a safe canned response.

The two rails worth turning on for almost any internal deployment are input and output. Input rails with a content-safety and jailbreak-detection NIM stop the obvious attacks. Output rails catch the model leaking PII or producing toxic text. Retrieval rails matter the moment you have a RAG system over documents with mixed sensitivity, because that is exactly where a model will helpfully summarize a file the asker had no right to read. If you took my advice in the fine-tuning post and kept knowledge in retrieval rather than weights, the retrieval rail is your access-control enforcement point.

Disclaimer: guardrails reduce risk, they do not eliminate it. Test rails against an adversarial prompt set before relying on them, monitor for false positives that frustrate real users, version your rail configs alongside the model, and never treat a content-safety rail as a substitute for proper access control on the underlying data.

My take

Guardrails are necessary and oversold in the same breath. They are necessary because an unguarded internal assistant is a genuine liability, the prompt-injection and data-leak risks are real even behind your firewall. They are oversold because vendors imply they make a model safe, and they do not. They make it safer. A determined adversary with enough attempts will still find gaps, which is why guardrails belong in a layered design with access control, audit logging, and human review for high-stakes outputs, not as a single magic filter. Turn on input and output rails as a baseline for every production assistant, add retrieval rails the moment sensitive documents enter the picture, and instrument the whole thing with the OpenTelemetry tracing so you can prove what fired. Treat a guardrail config like firewall policy: reviewed, versioned, and never edited live without a test.

Which rail would have saved you the most trouble so far, input or retrieval? In my experience it is almost always retrieval that gets overlooked.

References

VMware Private AI Series · Part 28 of 30
« Previous: Part 27  |  VMware Private AI Complete Guide  |  Next: Part 29 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading