Fine-Tuning Models on VMware Private AI with NeMo Customizer: LoRA, Full SFT and When to Bother (Private AI Series, Part 27)

RAG is not always the answer. Here is how NeMo Customizer fine-tunes models on VMware Private AI, the difference between LoRA and full SFT, and an honest take on when customization beats retrieval.

by

Dr. Pranay Jha

June 17, 2026

No comments

5 minutes

Read Time

VMware Private AI Series · Part 27 of 30

TL;DR · Key Takeaways

NeMo Customizer is the fine-tuning microservice in the Private AI stack, deployed as a CRD by the NIM Operator alongside Data Store, Entity Store and Evaluator.
It supports LoRA, full SFT, DPO and GRPO. LoRA is the right default for almost everyone.
Reach for full SFT only on small models (1B to 8B) or when you must inject genuinely new knowledge or change fundamental behavior.
Most teams who ask for fine-tuning actually need RAG. Fine-tuning teaches style and format, retrieval supplies facts.
The customization workflow is a loop: dataset in Data Store, job in Customizer, score in Evaluator, register in Model Store, serve via NIM.

The most expensive mistake I see on Private AI is a team spending six weeks and a rack of H100 hours full fine-tuning a 70B model to answer questions about their product catalog, when a RAG pipeline would have done it in an afternoon and stayed current automatically. So before the how-to, the most important section: when not to do this at all.

Fine-tune or retrieve? Decide this first

Here is the rule that has never failed me. Fine-tuning changes how a model behaves: its tone, its output format, its willingness to follow a niche instruction style, its grasp of a specialized vocabulary. Retrieval changes what a model knows at query time: your documents, your current prices, this week’s policy. If the requirement is facts that change, that is RAG, full stop. If the requirement is a consistent voice, a structured output, or a domain dialect the base model fumbles, that is fine-tuning. Most real projects need a bit of both, and the order matters: get RAG working first, then fine-tune only the behavior gaps that remain.

Work down the ladder. Most requirements are satisfied by step 2 or 3.

The four techniques, and which to pick

NeMo Customizer 25.8 supports four post-training methods. You do not need to master all of them. You need to know which one your problem maps to and skip the rest.

Technique	What it changes	GPU cost	Reach for it when
LoRA	Small adapter, base weights frozen	Low	Almost always, especially 70B+ models
Full SFT	Every parameter	Very high	Small models, deep behavior or knowledge change
DPO	Preference alignment from chosen/rejected pairs	Medium	You have human preference data and want to align tone
GRPO	Reinforcement-style optimization to a reward	High	Reasoning or task-reward tuning, advanced cases

The customization workflow on Private AI

On Private AI the NeMo microservices are deployed by the NIM Operator as custom resources, so the whole fine-tuning loop runs inside the same declarative platform as your serving. The pieces fit together as a cycle, not a one-shot job.

Evaluator is the gate. A customization job that does not beat the baseline never reaches the Model Store.

A LoRA job is a single declarative call against the Customizer API. You point it at a customization target, a dataset in the Data Store, and your hyperparameters. The output is a small adapter, often tens of megabytes, that a NIM serves on top of the frozen base model. That is the operational beauty of LoRA on this platform: you can host one base model and many adapters, swapping behaviors without reloading 140GB of weights.

# Launch a LoRA customization job against the Customizer API
curl -X POST http://nemo-customizer/v1/customization/jobs 
  -H "Content-Type: application/json" 
  -d '{
    "config": "meta/llama-3.1-8b-instruct",
    "dataset": {"name": "support-tone-v3"},
    "hyperparameters": {
      "training_type": "sft",
      "finetuning_type": "lora",
      "epochs": 3,
      "lora": {"adapter_dim": 16}
    }
  }'

# Track it, then evaluate before promoting
curl http://nemo-customizer/v1/customization/jobs/{job_id}/status

Disclaimer: fine-tuning runs are GPU-heavy and can starve serving workloads on a shared cluster. Run customization jobs in a separate namespace with its own quota, validate dataset quality before training, always gate promotion on an Evaluator score against a held-out set, and keep the base model and adapter versions pinned together so you can roll back.

My take

Fine-tuning is the most over-requested and under-needed capability in enterprise AI. NeMo Customizer makes it genuinely accessible on Private AI, which is exactly why you should put guardrails around who gets to launch a full SFT job. My standing advice to clients: make LoRA the only self-service option, keep full SFT behind a review gate, and require an Evaluator score on every promotion. Data quality beats technique every time, a clean thousand-example dataset will out-train a noisy hundred-thousand-example one. And tie this back to lifecycle discipline from the MLOps post, because an untracked fine-tuned model is a liability the day someone asks what data went into it.

What is pushing you toward fine-tuning, behavior or knowledge? If you cannot answer that cleanly, you are probably not ready to start.

References

VMware Private AI Series · Part 27 of 30
« Previous: Part 26 | VMware Private AI Complete Guide | Next: Part 28 »

About The Author

Dr. Pranay Jha

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

See author's posts

Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Tags: Fine-tuning, LoRA, nemo, NeMo Customizer, PAIF, Private AI Series, VMware Private AI

June 17, 2026

Dr. Pranay Jha