Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

VMware Private AI Agent Builder: Composing Models, Knowledge Bases and Prompts (Private AI Series, Part 15)

Agent Builder in VMware Private AI Services lets you compose a model endpoint, a knowledge base and prompt instructions into a grounded agent. Here is what it actually does, where it sits, and where the agentic hype gets ahead of reality.

VMware Private AI Series · Part 15 of 24

Call something an "agent" in 2026 and half the room pictures software that books its own travel and rewrites its own goals. That is not what you get when you open Agent Builder in VMware Private AI Services. What you get is more useful and a lot less dramatic: one place to wire a model endpoint to a knowledge base, pin a prompt, and ship a grounded chat backend your users can actually trust.

What Agent Builder actually is

Agent Builder is the composition layer of Private AI Services. Think of it as a router with a save button. An incoming prompt goes first to a knowledge base for retrieval, then to a completions model endpoint that generates the answer from what was retrieved. You supply three ingredients and the service wires them together.

Those three ingredients are a completions model endpoint (served by the Model Runtime and reached through the ML API gateway), an indexed knowledge base (built by the Data Indexing and Retrieval Service and stored in pgvector on Data Services Manager), and your prompt instructions plus a handful of advanced retrieval settings. A playground in the UI gives you a fast loop to swap models, adjust the prompt, and watch the answers change. When the configuration behaves, you save it, and that saved agent becomes the backend your application calls. If you have followed the RAG pipeline and Model Store and Model Runtime parts of this series, Agent Builder is where those pieces finally meet.

What an agent is hereThree inputs, composed and saved as one grounded backendModel endpointCompletions, via ML API gatewayKnowledge baseIndexed, pgvector on DSMPrompt instructionsPlus advanced settingsAgent Builderroute, then saveYour appchat UI or API consumer
Agent Builder composes a model, a knowledge base and a prompt into one saved backend.

Where it sits in the Private AI Services stack

Agent Builder is the last mile, not the whole platform. Underneath it sits the rest of Private AI Services, and every layer has to be in place before the agent on top is worth anything. Skip the indexing work and you get a confident model that knows nothing about your business.

Model Gallery and Model Governance handle importing, validating and sharing approved models. Model Runtime and the ML API gateway run those models as a scalable service so separate teams broker requests against shared GPU capacity. The Data Indexing and Retrieval Service pulls from sources like Confluence, SharePoint, Google Drive and S3, chunks and embeds the documents, and refreshes them on a schedule into a pgvector knowledge base on Data Services Manager. Agent Builder then composes a model and a knowledge base into an agent, and your front end (Open WebUI or your own application) consumes it.

In VCF 9.1 this whole flow moved into the VCF Private AI Services UI inside a namespace. Organization administrators enable Private AI Services through VCF Automation, and end users do the document upload, the indexing jobs, and the agent creation from one console. That matters operationally: the people building agents are no longer waiting on a platform team for every step.

Where Agent Builder sitsConsumption stack, top to bottomFront endOpen WebUI or your own applicationAgent BuilderCompose model + knowledge base + promptKnowledge basesData Indexing and Retrieval + pgvector on DSMModel Runtime and ML API gatewayModels as a service, shared GPU capacityModel Gallery and GovernanceImport, validate, share approved models
Agent Builder is the top of the stack; everything below it has to exist first.

How a query flows through an agent

Here is what happens on a single question:

  1. The user sends a prompt to the saved agent endpoint.
  2. Agent Builder routes it to the attached knowledge base, where an embedding model and a similarity search in pgvector return the most relevant chunks.
  3. Those chunks are injected as context alongside your prompt instructions.
  4. The completions model generates an answer grounded in that retrieved context.
  5. The response goes back to your application.

The grounding is the entire point. The model answers from your indexed corpus, not from whatever it absorbed during training. That is what makes the output defensible in an enterprise setting, and it is why the knowledge base, not the model, is usually where these projects quietly succeed or fail.

How a single question flows1User prompt2Agent Builder3Knowledge basepgvector retrieval4Completionsgrounded answer5AnswerRetrieval happens before generation, so the model answers from your corpus.
Prompt to retrieval to generation: the path every question takes.

Myth versus reality: this is not autonomous agentic AI

Time for the blunt part. Agent Builder today is a RAG composition and serving tool. It is not an autonomous, multi-step agent framework, and the word "agent" is doing a lot of marketing work across the industry right now.

What it does: route a prompt through retrieval and a completions model, with instructions and access controls attached. What it does not do out of the box: run long tool-calling loops, plan multi-step tasks, or take actions across your systems on its own the way frameworks such as LangGraph or CrewAI advertise. Richer tools and a deeper playground are on the roadmap, and that direction is real, but if you walk in expecting self-directed agents that go off and do things, you will be let down.

Myth versus realityThe hypeAgents that set their own goalsLong tool-calling loops across systemsSelf-directed multi-step executionActs on production without a humanAgent Builder todayRoutes a prompt to a KB, then a modelGrounded answers from your corpusPer-agent knowledge-base access controlA playground and a save button
The gap between the agentic pitch and what ships in the box.
DimensionAgent BuilderDIY agent framework
Primary jobCompose RAG chat backendsOrchestrate multi-step agents
Multi-step tool useLimited today, roadmap itemCore feature
Governance and accessBuilt in, per knowledge baseYou build it yourself
Data groundingNative, via pgvector KBsWire it up yourself
Best fitGoverned chat over private dataComplex autonomous workflows

My take after enough of these conversations: that gap is fine, and arguably a feature. The value most enterprises can actually capture this year is governed, grounded chat over private data, with each business unit pinned to its own knowledge base and its own access scope. That is a solved, shippable problem. Autonomous agents acting across production systems are mostly still a demo, and a governance headache you do not want to inherit yet.

Where it fits, and where it doesn’t

Recommended when you want governed chat backends grounded in private knowledge bases, with the model and the data staying inside your VCF estate, and when you want different teams composing their own agents over their own slices of the corpus without data bleeding across domains. That last point, fine-grained knowledge-base access per agent, is the quietly valuable part and the reason to prefer Agent Builder over a bolted-on open-source stack.

Not the right tool when you need complex multi-agent orchestration, long tool-calling chains, or framework-level customization today. In that case, serve the model through NIM or the Model Runtime and build the agent layer yourself, then point it at the same knowledge bases. You keep the platform benefits without forcing a framework problem into a tool that was not built for it.

Validate three things before you promise anything to a business unit. First, that your indexing and refresh policy actually keep the knowledge base current, because a stale corpus is the silent failure mode here and nobody notices until an agent confidently cites last quarter’s policy. Second, that the completions endpoint is sized for real concurrency, not a single demo user. Third, that access controls map cleanly to your tenancy model so one team’s agent cannot read another team’s documents.

My Take

Agent Builder is the least flashy and most practical part of Private AI Services. It will not hand you an autonomous workforce. It will let a platform team stand up grounded, access-controlled AI chat over private data in an afternoon, which is what most organizations actually need first. Treat it as the composition and serving layer it is, keep the agentic ambitions on a separate roadmap, and you will ship something useful instead of something impressive in a slide. Are you building agents on Private AI Services yet, or still stuck arguing about what "agent" even means in your shop?

References

VMware Private AI Series · Part 15 of 30
« Previous: Part 14  |  VMware Private AI Complete Guide  |  Next: Part 16 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading