TL;DR · Key Takeaways
- NeMo Customizer is the fine-tuning microservice in the Private AI stack, deployed as a CRD by the NIM Operator alongside Data Store, Entity Store and Evaluator.
- It supports LoRA, full SFT, DPO and GRPO. LoRA is the right default for almost everyone.
- Reach for full SFT only on small models (1B to 8B) or when you must inject genuinely new knowledge or change fundamental behavior.
- Most teams who ask for fine-tuning actually need RAG. Fine-tuning teaches style and format, retrieval supplies facts.
- The customization workflow is a loop: dataset in Data Store, job in Customizer, score in Evaluator, register in Model Store, serve via NIM.
The most expensive mistake I see on Private AI is a team spending six weeks and a rack of H100 hours full fine-tuning a 70B model to answer questions about their product catalog, when a RAG pipeline would have done it in an afternoon and stayed current automatically. So before the how-to, the most important section: when not to do this at all.
Fine-tune or retrieve? Decide this first
Here is the rule that has never failed me. Fine-tuning changes how a model behaves: its tone, its output format, its willingness to follow a niche instruction style, its grasp of a specialized vocabulary. Retrieval changes what a model knows at query time: your documents, your current prices, this week’s policy. If the requirement is facts that change, that is RAG, full stop. If the requirement is a consistent voice, a structured output, or a domain dialect the base model fumbles, that is fine-tuning. Most real projects need a bit of both, and the order matters: get RAG working first, then fine-tune only the behavior gaps that remain.
The four techniques, and which to pick
NeMo Customizer 25.8 supports four post-training methods. You do not need to master all of them. You need to know which one your problem maps to and skip the rest.
| Technique | What it changes | GPU cost | Reach for it when |
|---|---|---|---|
| LoRA | Small adapter, base weights frozen | Low | Almost always, especially 70B+ models |
| Full SFT | Every parameter | Very high | Small models, deep behavior or knowledge change |
| DPO | Preference alignment from chosen/rejected pairs | Medium | You have human preference data and want to align tone |
| GRPO | Reinforcement-style optimization to a reward | High | Reasoning or task-reward tuning, advanced cases |
The customization workflow on Private AI
On Private AI the NeMo microservices are deployed by the NIM Operator as custom resources, so the whole fine-tuning loop runs inside the same declarative platform as your serving. The pieces fit together as a cycle, not a one-shot job.
A LoRA job is a single declarative call against the Customizer API. You point it at a customization target, a dataset in the Data Store, and your hyperparameters. The output is a small adapter, often tens of megabytes, that a NIM serves on top of the frozen base model. That is the operational beauty of LoRA on this platform: you can host one base model and many adapters, swapping behaviors without reloading 140GB of weights.
# Launch a LoRA customization job against the Customizer API
curl -X POST http://nemo-customizer/v1/customization/jobs
-H "Content-Type: application/json"
-d '{
"config": "meta/llama-3.1-8b-instruct",
"dataset": {"name": "support-tone-v3"},
"hyperparameters": {
"training_type": "sft",
"finetuning_type": "lora",
"epochs": 3,
"lora": {"adapter_dim": 16}
}
}'
# Track it, then evaluate before promoting
curl http://nemo-customizer/v1/customization/jobs/{job_id}/status
My take
Fine-tuning is the most over-requested and under-needed capability in enterprise AI. NeMo Customizer makes it genuinely accessible on Private AI, which is exactly why you should put guardrails around who gets to launch a full SFT job. My standing advice to clients: make LoRA the only self-service option, keep full SFT behind a review gate, and require an Evaluator score on every promotion. Data quality beats technique every time, a clean thousand-example dataset will out-train a noisy hundred-thousand-example one. And tie this back to lifecycle discipline from the MLOps post, because an untracked fine-tuned model is a liability the day someone asks what data went into it.
What is pushing you toward fine-tuning, behavior or knowledge? If you cannot answer that cleanly, you are probably not ready to start.
References
- NeMo Customizer customization concepts
- LoRA model customization job tutorial
- NVIDIA NIM Operator release notes (NeMo microservices versions)
« Previous: Part 26 | VMware Private AI Complete Guide | Next: Part 28 »



