There is a comfortable assumption baked into the phrase Private AI: that because the model runs on your own GPUs behind your own firewall, it is safe. Private and safe are different properties. A model running entirely on-premises will still cheerfully follow a jailbreak prompt, leak the contents of a document it retrieved, or answer a question about a topic you never wanted it discussing with customers. Privacy is about where the data goes. Safety is about what the model says. This post is about the second one, and the service that handles it on Private AI: NeMo Guardrails.
Where guardrails sit
NeMo Guardrails is not part of the model. It is a microservice that sits in front of it, inspecting and potentially blocking traffic in both directions. On Private AI it deploys as a custom resource through the NIM Operator, the same way your serving does, and version 25.10 added OpenTelemetry tracing so you can actually see which rail fired and why. That observability matters more than it sounds: a guardrail that silently blocks legitimate queries is worse than no guardrail, because nobody knows why the assistant suddenly went useless.
The five rail types
Guardrails are organized as rails, and there are five kinds. Most teams switch on two and ignore the rest, which is fine if it is a deliberate choice and not ignorance of what the other three do.
| Rail | Fires | Catches |
|---|---|---|
| Input | Before the model sees the prompt | Jailbreaks, prompt injection, off-topic requests |
| Dialog | During the conversation flow | Steering the conversation, enforcing scripted paths |
| Retrieval | On retrieved RAG chunks | Sensitive documents the user should not see |
| Execution | When the model calls a tool | Unsafe actions, unauthorized tool use |
| Output | Before the answer reaches the user | Toxicity, PII leakage, hallucinated claims |
The two rails worth turning on for almost any internal deployment are input and output. Input rails with a content-safety and jailbreak-detection NIM stop the obvious attacks. Output rails catch the model leaking PII or producing toxic text. Retrieval rails matter the moment you have a RAG system over documents with mixed sensitivity, because that is exactly where a model will helpfully summarize a file the asker had no right to read. If you took my advice in the fine-tuning post and kept knowledge in retrieval rather than weights, the retrieval rail is your access-control enforcement point.
My take
Guardrails are necessary and oversold in the same breath. They are necessary because an unguarded internal assistant is a genuine liability, the prompt-injection and data-leak risks are real even behind your firewall. They are oversold because vendors imply they make a model safe, and they do not. They make it safer. A determined adversary with enough attempts will still find gaps, which is why guardrails belong in a layered design with access control, audit logging, and human review for high-stakes outputs, not as a single magic filter. Turn on input and output rails as a baseline for every production assistant, add retrieval rails the moment sensitive documents enter the picture, and instrument the whole thing with the OpenTelemetry tracing so you can prove what fired. Treat a guardrail config like firewall policy: reviewed, versioned, and never edited live without a test.
Which rail would have saved you the most trouble so far, input or retrieval? In my experience it is almost always retrieval that gets overlooked.
References
- NeMo Guardrails on the NIM Operator
- NIM Operator release notes (Guardrails v25.10, OTel tracing)
- NVIDIA NeMo Guardrails documentation
« Previous: Part 27 | VMware Private AI Complete Guide | Next: Part 29 »








