Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VMware Cloud Foundation 9: Lessons From the Whole Series and My Verdict (VCF 9 Series, Part 36)

The capstone to the 36-part VCF 9 series: the through-line, the five decisions that decide every program, what 9.1 fixed, where the platform still bites, and a straight verdict on who should adopt it now.

VCF 9 Series · Part 36 of 36

TL;DR · Key Takeaways

  • VCF 9 is an operating-model change, not a vSphere refresh. Treat it as one platform with one lifecycle and one API, or it fights you.
  • Five decisions decide the program before any host powers on: adoption path, management-domain sizing, network design, licensing math, and operations model.
  • 9.1 (GA May 2026) closed real gaps: a redesigned installer, NVMe memory tiering, vCenter quick patch, and a unified API-first consumption layer.
  • The honest friction is commercial. The 16-core-per-CPU subscription floor punishes small and edge estates, so model the run-rate before you size.
  • This page is the hub for the whole series: a linked index of all 36 parts, the recurring mistakes, a cheat sheet, and a scenario-based verdict.

Thirty-five posts ago this series opened with one claim: VMware Cloud Foundation 9 is not vSphere with extra parts bolted on, it is a different operating model for the private cloud. After walking through planning, bring-up, brownfield conversion, migration, day-2 operations, security, and the AI stack, that claim has held. This final part is the one I wish existed at the start. It pulls the whole journey together: the single idea that ties it, the handful of decisions that actually decide whether a VCF 9 program succeeds or stalls, a guided index back into every part of the series, the mistakes that kept reappearing across real engagements, and an honest read on where the platform is still rough in mid-2026. Read it top to bottom as a capstone, or jump to the index and use it as a map.

The through-line: one platform, one lifecycle, one API

If you read only one idea out of 36 posts, read this one. Older VCF releases still felt like a bundle: you ran vSphere, vSAN, and NSX as separate products, each with its own version, its own patch cadence, and its own console, and SDDC Manager tried to keep the set roughly in step. VCF 9 throws that model out. It organizes everything as a fleet of instances, each instance built from a management domain and one or more workload domains, and it puts a single shared lifecycle over the whole thing. The components stop being products you assemble and become layers of one platform that move together.

VCF 9.1, which reached general availability in May 2026, pushed the idea harder with an API-first consumption layer. The fragmented automation that used to live in separate SDDC Manager, vCenter, NSX, and vSAN endpoints now sits behind a single set of API contracts. That matters less because it is new and more because of what it forces. The platform assumes you will drive it as one system: declare the state you want, let the lifecycle reconcile it, and stop hand-patching individual components out of band. The teams that struggled this year were almost always the ones that deployed VCF 9 and then operated it like vSphere 7 with a new logo, clicking through four consoles and wondering why the lifecycle kept flagging drift.

So the through-line is not a feature. It is a posture. Commit to the unified model and the platform rewards you with coordinated upgrades, consistent state, and a real operations plane. Resist it, keep your old per-component habits, and you inherit all the complexity of the bundle with none of the payoff. Every part of this series was, in some form, a consequence of taking that one idea seriously.

The five decisions that decide the outcome

Across every engagement pattern this series covered, the same five choices separated the smooth programs from the ones that bled time and budget. None of them are about which button to click. They are commitments you make before the first host powers on, and each one is hard to reverse later. For each, here is why it matters, when the obvious answer is wrong, and what to validate before you lock it in.

The five decisions that decide the outcomeCommitments you make before the first host powers on, and hard to reverse later1Adoption pathconverge, import, or greenfield2Management-domain sizingsize for what the platform becomes3Network designVLANs, MTU, BGP validated first4Licensing modelcore math as a design input5Operations modelcommit to VCF Operations on day one
Get these five right up front and almost everything downstream gets easier.

1. The adoption path

Converge an existing vSphere estate into VCF, import brownfield vSAN and NSX, or build greenfield. This is the single highest-impact call in the program because it sets the constraints for everything downstream. Greenfield is cleanest and lets you design to the reference architecture, but it needs new hardware and a migration plan for the workloads. Converge is attractive when you have a healthy vSphere estate you want to keep, yet it carries forward whatever design debt that estate already had. Import looks cheap and is the one that bites hardest, because brownfield vSAN and NSX have to meet specific prerequisites before VCF will adopt them. Validate the current versions, the network topology, and the storage policy posture against the import requirements first. Get this wrong and you carry the mismatch as technical debt for years.

2. Management-domain sizing

The most common failure I see is an under-sized management domain. Teams size it for the handful of appliances they stand up on day one, then watch the headroom evaporate the moment VCF Operations, VCF Automation, and the AI services land on it. The management domain is not a small administrative cluster anymore, it is where the platform’s brain runs, and that brain grows. Size for what the platform becomes over the next two years, not the four appliances you start with. Validate the resource reservations for every management component you plan to run, add the ones you will add later, and leave failure-domain headroom on top. It is far cheaper to provision one extra host now than to restack a live management domain.

3. Network design

Bring-up does not usually fail on storage or compute. It fails on a VLAN that was never trunked, an MTU mismatch on the overlay, or a BGP peering that nobody validated end to end. Network design is the least glamorous decision and the one that pays back the most careful preparation. VCF 9 has firm expectations about your underlay, your uplinks, and your routing, and the installer will stop hard when reality does not match the workbook. Validate MTU consistency across every hop, confirm the uplink design (and whether you are committing to LACP), and test routing before bring-up rather than during it. The 9.1 installer is more forgiving of complex topologies, but it cannot fix a fabric that was never built to spec.

4. The licensing model

Core counting and the vSAN capacity entitlement decide your run-rate for the life of the contract, which makes licensing a design input rather than a procurement afterthought. The subscription model counts cores, with a per-CPU minimum, and bundles components you may or may not fully use. The shape of your hardware (how many sockets, how many cores per socket, how dense the hosts are) changes the bill more than almost any other choice. Validate the core math against your actual processor SKUs, model the vSAN capacity you are entitled to versus what you will consume, and do it before you finalize the hardware. The cheapest licensing optimization is choosing the right host before you buy it.

5. The operations model

Adopting VCF Operations as the single pane, rather than bolting old tooling onto a new platform, is what makes the unified lifecycle actually deliver. This is a decision because it is tempting to skip. The team knows its existing monitoring and runbooks, the new operations plane is unfamiliar, and the pressure on day one is to get workloads running. But if you keep operating each component through its own console and its own scripts, you keep the silos VCF 9 was built to remove, and the platform’s coordinated lifecycle becomes a thing you fight rather than use. Commit to VCF Operations from the start, retire the overlapping tools deliberately, and validate that your alerting, capacity, and cost workflows map onto it before go-live.

The VCF 9 journey, in five stages What the 36-part series walked through, end to end 1 Plan Licensing, sizing, network, ref arch 2 Build Mgmt + workload domains, NSX, LB 3 Migrate Converge, import, HCX, cutover 4 Operate VCF Ops, fleet lifecycle, security 5 Optimize Cost, AI stack, tuning, BCDR
The arc the series followed: plan the platform, build the domains, migrate the workloads, operate as one fleet, then optimize.

The whole series, in one map

Here is the entire journey as an annotated index. Each entry is the one-line lesson that part exists to teach, with a link to the full article. Use it to find the deep dive you need, or to send a colleague straight to the right place.

Plan and design (Parts 1 to 7)

  • Part 1, VCF 9 Explained: what the unified private cloud platform actually is, and why it is a model change, not a version bump.
  • Part 2, Architecture: fleet, instances, and domains, and how they fit together.
  • Part 3, Licensing: core counting, the vSAN capacity entitlement, and the costly mistakes to avoid.
  • Part 4, Planning and prerequisites: the readiness checklist that prevents a stalled bring-up.
  • Part 5, Network design: the seven mistakes that break a deployment.
  • Part 6, Storage design: vSAN ESA versus OSA and when to choose which.
  • Part 7, Reference architecture: sizing, topology, and the design trade-offs that matter.

Build (Parts 8 to 11)

Adopt and migrate (Parts 12 to 18)

Operate (Parts 19 to 23)

Deep dives (Parts 24 to 35)


The mistakes that showed up again and again

Patterns repeat. Across the planning, migration, and day-2 posts, the same handful of mistakes kept surfacing in different costumes. If you internalize nothing else operationally, internalize these.

Treating VCF 9 as a hypervisor upgrade. This is the root cause behind most of the others. Teams budget for an ESXi refresh and discover a platform that wants a new operating model, new skills, and a real operations practice. The fix is to scope it as a program with a design phase, not a maintenance window.

Skipping the bill-of-materials and interoperability check. Almost every stalled bring-up and failed upgrade traced back to a component, driver, or firmware level that was never validated against the target BOM. The platform is opinionated about supported combinations. Run the prechecks, confirm the HCL, and verify interoperability before you touch production.

Under-provisioning the management domain. It appears in the planning posts and again in the operations posts, because the symptom shows up late. Size for the platform you will be running in two years, including Operations, Automation, and any AI services.

Designing the network last. Storage and compute are forgiving; the fabric is not. MTU, VLANs, and routing should be designed and tested first, because they are the most common reason bring-up stops dead.

Leaving the licensing math until procurement. By then the hardware is chosen and the core count is fixed. The run-rate is a design output, and the time to influence it is while you are still picking sockets and host density.

Keeping the old tooling. Running VCF 9 through legacy per-component consoles preserves exactly the silos the platform exists to remove, and quietly defeats the coordinated lifecycle you paid for.

The mistakes that showed up again and againThe same handful of errors, in different costumesTreating VCF 9 as a hypervisor upgradeSkipping the BOM / interop checkUnder-provisioning the management domainDesigning the network lastLeaving licensing math to procurementKeeping the old per-component tooling
Treating VCF 9 as a hypervisor upgrade is the root cause behind most of the others.

What VCF 9 gets right

Credit where it is due. The 9.1 release closed several of the gaps that made 9.0 feel like an early platform. The redesigned installer handles complex Day 0 topologies that previously needed manual workarounds, including IPv4/IPv6 dual stack and LACP-based uplink designs, with fewer inputs than the old bring-up flow. That alone removes a class of network-design pain the reference architecture in Part 7 had to design around.

Two more land as real operational wins. Enhanced NVMe memory tiering offloads roughly 20 to 25 percent of memory accesses to a high-performance NVMe tier, which raises VM density and lowers per-host cost without a visible hit to application responsiveness. And vCenter quick patch now applies only the RPMs that actually changed in a patch payload, which cuts vCenter patching to minimal and sometimes zero downtime. For a platform whose biggest day-2 complaint has always been lifecycle pain, that is a meaningful change, not a slide-deck feature.

The deeper win is the API-first consumption layer. When provisioning, lifecycle, and operations all speak one set of contracts, automation stops being a pile of brittle per-component scripts and becomes something you can actually maintain. Combined with the platform’s positioning as a home for enterprise AI, where the Private AI Foundation and NVIDIA stack run as a first-class workload rather than a bolt-on, VCF 9.1 is a coherent product in a way 9.0 only promised to be.

Where it still bites

I will not pretend the year was smooth. The honest friction is mostly commercial, and it is real. The subscription model replaced perpetual licensing with a per-core minimum of 16 cores per CPU, which makes small hosts uneconomic on a price-performance basis: you pay for cores a socket may not even have. Benchmarks across renewals through 2024 and 2025 showed annual costs rising sharply, with many teams reporting 3x to 6x increases and the removal of smaller SKUs forcing NSX and vSAN into a bundle some buyers do not fully use. If your estate is a handful of dense hosts, VCF 9 is a strong fit. If it is many small clusters at the edge, run the math hard before you commit, because the licensing floor can dominate the business case.

The other friction is conceptual weight. The full stack assumes you want VCF Operations, Automation, VKS, and the AI services as one platform. For an organization that genuinely needs a private cloud, that is the point. For one that wanted a hypervisor refresh, it is a lot of platform to stand up and operate, and pretending otherwise sets the wrong expectation with the team that has to run it. The skills gap is real too: the operating model rewards automation and platform thinking, and a team that has only ever clicked through vCenter will need time and training to catch up.


The VCF 9 cheat sheet

One screen to keep next to a design session. Each row is the rule of thumb the series landed on, plus the thing that most often catches teams out. Treat the rules of thumb as defaults to validate against your own bill of materials, not as gospel.

AreaRule of thumbWatch out for
LicensingSubscription, per core, with a 16-core-per-CPU minimum. Model the run-rate first.Small and edge hosts are uneconomic; you pay for cores a socket may not have.
Adoption pathPick converge, import, or greenfield before anything else.The wrong path becomes technical debt you carry for years.
Management domainSize for the full platform, not the four appliances you start with.Headroom vanishes when Operations, Automation, and AI services land.
NetworkValidate VLANs, MTU, and BGP peering before bring-up.Most bring-up failures are network, not storage or compute.
StoragevSAN ESA is the default on supported hardware.Confirm the HCL; fall back to OSA only where ESA prerequisites are not met.
OperationsAdopt VCF Operations as the single pane from day one.Bolting on old tooling keeps the silos the platform was built to remove.
UpgradesMove 9.0 to 9.1 as a full-stack, sequenced operation; use vCenter quick patch for interim fixes.Check BOM interoperability and run prechecks before every step.
AI stackRun Private AI Foundation with NVIDIA on a dedicated workload domain.GPU and vGPU sizing plus driver and BOM alignment decide whether it works.

What I’d Do

Here is the verdict after 36 parts, broken down by the situation you are actually in, because the right answer is not the same for everyone.

Large enterprise with a private-cloud mandate. Adopt it, and adopt it fully. VCF 9 in its 9.1 form is the strongest version VMware has shipped, and for a dense estate that needs self-service, multi-tenancy, Kubernetes, and AI on one platform, it is the right place to standardize. Lock the adoption path and the licensing math first, size the management domain for what the platform becomes, and commit to VCF Operations from day one.

Mid-size estate weighing the move. The platform fits, but the business case is tighter, so do the licensing model before anything else and be ruthless about host density. Phase the adoption: stand up the management domain and one workload domain, prove the operating model on a real workload, then expand. Do not try to light up every capability at once.

Many small clusters or edge sites. Be honest. If you only wanted vSphere, do not let the platform narrative push you into a footprint the licensing floor will punish. Model the cost against your real core counts, look hard at whether vSphere Foundation or a partner-hosted option fits better, and only commit to full VCF where the private-cloud capabilities genuinely earn their keep.

AI-driven build. If the AI stack is the reason you are here, design around it from the start: dedicate a workload domain, get the GPU, vGPU, and driver alignment right early, and treat the Private AI Foundation with NVIDIA reference design as a hard requirement rather than a later add-on. The platform is genuinely good at this, but only if the hardware and BOM are planned for it up front.

That closes the series. If you have read along from Part 1, thank you for staying with it. Which part of your own VCF 9 journey turned out hardest in practice, the migration or the day-2 operating model? Tell me in the comments, because that is where the next round of writing should go.


References

VCF 9 Series · Part 36 of 36
« Previous: Part 35  |  VCF 9 Complete Guide

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading