Tag: GenAI Series
-
“Looks Good” Isn’t Enough: Evaluating GenAI Output (GenAI Series, Part 18)
Fluent is not the same as correct. How to evaluate GenAI output properly: build a golden set, choose human, automatic or model-graded scoring, and run it as a harness.
-
AI Agents: What Actually Works, and What’s Hype (GenAI Series, Part 16)
An AI agent is a model in a loop that plans, calls tools, and observes results. What agents genuinely do well today, and why reliability, not intelligence, is the real bottleneck.
-
Fine-Tuning vs RAG vs Prompting: Which One, and When (GenAI Series, Part 15)
Prompting steers, RAG adds facts, fine-tuning changes behaviour. The one question that decides which to use, a side-by-side comparison, and why to escalate in order of cost.
-
RAG: How to Stop Your AI Making Things Up (GenAI Series, Part 13)
Retrieval-augmented generation lets a model answer from your own documents by fetching the relevant passages at question time. How RAG works, and why it beats fine-tuning for facts.
-
Prompt Engineering That Actually Works (GenAI Series, Part 12)
Prompt engineering is not secret incantations, it is clear communication. The four moves that do most of the work, system vs user prompts, and the anti-patterns that waste tokens.
-
Why AI Models Make Things Up (and What Temperature Does) (GenAI Series, Part 11)
AI models generate by sampling likely words from a probability distribution. Why that produces confident hallucinations, what the temperature setting really does, and how to reduce it.
-
The Context Window, and Why Models Forget (GenAI Series, Part 10)
The context window is everything an AI can see at once. Why models have no memory between turns, why longer prompts cost more, and why details get lost in the middle.
-
Training vs Inference: Why Using AI Is the Real Cost (GenAI Series, Part 9)
Training builds a model once in three stages; inference runs it on every request, forever. Why the recurring inference bill, not the headline training cost, decides AI economics.
-
Attention, the Idea That Made Modern AI Work (GenAI Series, Part 8)
How attention lets every word in a sentence weigh every other word, why it replaced slow left-to-right models, and why running in parallel is what let AI scale.
Architect’s Toolkit
VMware Cloud Foundation
- VCF Documentation
- VCF 9 Planning & Preparation Workbook
- VCF Bill of Materials (BoM)
- VMware Compatibility Guide
- VMware Interoperability Matrix
- VMware Configuration Maximums
- VMware Ports & Protocols
- VMware Hands-on Labs
- RVTools Download
Nutanix
AI & Cloud-Native Platform
- AI Infra Sizing & Cost Calculator
- NVIDIA Build (Model Catalog)
- NVIDIA AI Enterprise Reference Architecture
- NVIDIA NIM Performance Benchmarking
- NVIDIA NGC Catalog
- NeMo Microservices Helm Chart
- Helm Charts Repository
- Hugging Face Models
Architecture & Design
About the Author

Dr Pranay Jha
Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

You May Have Missed






