Call something an "agent" in 2026 and half the room pictures software that books its own travel and rewrites its own goals. That is not what you get when you open Agent Builder in VMware Private AI Services. What you get is more useful and a lot less dramatic: one place to wire a model endpoint to a knowledge base, pin a prompt, and ship a grounded chat backend your users can actually trust.
What Agent Builder actually is
Agent Builder is the composition layer of Private AI Services. Think of it as a router with a save button. An incoming prompt goes first to a knowledge base for retrieval, then to a completions model endpoint that generates the answer from what was retrieved. You supply three ingredients and the service wires them together.
Those three ingredients are a completions model endpoint (served by the Model Runtime and reached through the ML API gateway), an indexed knowledge base (built by the Data Indexing and Retrieval Service and stored in pgvector on Data Services Manager), and your prompt instructions plus a handful of advanced retrieval settings. A playground in the UI gives you a fast loop to swap models, adjust the prompt, and watch the answers change. When the configuration behaves, you save it, and that saved agent becomes the backend your application calls. If you have followed the RAG pipeline and Model Store and Model Runtime parts of this series, Agent Builder is where those pieces finally meet.
Where it sits in the Private AI Services stack
Agent Builder is the last mile, not the whole platform. Underneath it sits the rest of Private AI Services, and every layer has to be in place before the agent on top is worth anything. Skip the indexing work and you get a confident model that knows nothing about your business.
Model Gallery and Model Governance handle importing, validating and sharing approved models. Model Runtime and the ML API gateway run those models as a scalable service so separate teams broker requests against shared GPU capacity. The Data Indexing and Retrieval Service pulls from sources like Confluence, SharePoint, Google Drive and S3, chunks and embeds the documents, and refreshes them on a schedule into a pgvector knowledge base on Data Services Manager. Agent Builder then composes a model and a knowledge base into an agent, and your front end (Open WebUI or your own application) consumes it.
In VCF 9.1 this whole flow moved into the VCF Private AI Services UI inside a namespace. Organization administrators enable Private AI Services through VCF Automation, and end users do the document upload, the indexing jobs, and the agent creation from one console. That matters operationally: the people building agents are no longer waiting on a platform team for every step.
How a query flows through an agent
Here is what happens on a single question:
- The user sends a prompt to the saved agent endpoint.
- Agent Builder routes it to the attached knowledge base, where an embedding model and a similarity search in pgvector return the most relevant chunks.
- Those chunks are injected as context alongside your prompt instructions.
- The completions model generates an answer grounded in that retrieved context.
- The response goes back to your application.
The grounding is the entire point. The model answers from your indexed corpus, not from whatever it absorbed during training. That is what makes the output defensible in an enterprise setting, and it is why the knowledge base, not the model, is usually where these projects quietly succeed or fail.
Myth versus reality: this is not autonomous agentic AI
Time for the blunt part. Agent Builder today is a RAG composition and serving tool. It is not an autonomous, multi-step agent framework, and the word "agent" is doing a lot of marketing work across the industry right now.
What it does: route a prompt through retrieval and a completions model, with instructions and access controls attached. What it does not do out of the box: run long tool-calling loops, plan multi-step tasks, or take actions across your systems on its own the way frameworks such as LangGraph or CrewAI advertise. Richer tools and a deeper playground are on the roadmap, and that direction is real, but if you walk in expecting self-directed agents that go off and do things, you will be let down.
| Dimension | Agent Builder | DIY agent framework |
|---|---|---|
| Primary job | Compose RAG chat backends | Orchestrate multi-step agents |
| Multi-step tool use | Limited today, roadmap item | Core feature |
| Governance and access | Built in, per knowledge base | You build it yourself |
| Data grounding | Native, via pgvector KBs | Wire it up yourself |
| Best fit | Governed chat over private data | Complex autonomous workflows |
My take after enough of these conversations: that gap is fine, and arguably a feature. The value most enterprises can actually capture this year is governed, grounded chat over private data, with each business unit pinned to its own knowledge base and its own access scope. That is a solved, shippable problem. Autonomous agents acting across production systems are mostly still a demo, and a governance headache you do not want to inherit yet.
Where it fits, and where it doesn’t
Recommended when you want governed chat backends grounded in private knowledge bases, with the model and the data staying inside your VCF estate, and when you want different teams composing their own agents over their own slices of the corpus without data bleeding across domains. That last point, fine-grained knowledge-base access per agent, is the quietly valuable part and the reason to prefer Agent Builder over a bolted-on open-source stack.
Not the right tool when you need complex multi-agent orchestration, long tool-calling chains, or framework-level customization today. In that case, serve the model through NIM or the Model Runtime and build the agent layer yourself, then point it at the same knowledge bases. You keep the platform benefits without forcing a framework problem into a tool that was not built for it.
Validate three things before you promise anything to a business unit. First, that your indexing and refresh policy actually keep the knowledge base current, because a stale corpus is the silent failure mode here and nobody notices until an agent confidently cites last quarter’s policy. Second, that the completions endpoint is sized for real concurrency, not a single demo user. Third, that access controls map cleanly to your tenancy model so one team’s agent cannot read another team’s documents.
My Take
Agent Builder is the least flashy and most practical part of Private AI Services. It will not hand you an autonomous workforce. It will let a platform team stand up grounded, access-controlled AI chat over private data in an afternoon, which is what most organizations actually need first. Treat it as the composition and serving layer it is, keep the agentic ambitions on a separate roadmap, and you will ship something useful instead of something impressive in a slide. Are you building agents on Private AI Services yet, or still stuck arguing about what "agent" even means in your shop?
References
- Building your GenAI Agents on VCF with Private AI Services (VCF Blog)
- Private AI Services: New in VMware Private AI Foundation with NVIDIA in VCF 9.0
- VMware Private AI Foundation with NVIDIA Guide (Broadcom TechDocs)
« Previous: Part 14 | VMware Private AI Complete Guide | Next: Part 16 »








