TL;DR · Key Takeaways
- VCF 9 is an operating-model change, not a vSphere refresh. Treat it as one platform with one lifecycle and one API, or it fights you.
- Five decisions decide the program before any host powers on: adoption path, management-domain sizing, network design, licensing math, and operations model.
- 9.1 (GA May 2026) closed real gaps: a redesigned installer, NVMe memory tiering, vCenter quick patch, and a unified API-first consumption layer.
- The honest friction is commercial. The 16-core-per-CPU subscription floor punishes small and edge estates, so model the run-rate before you size.
- This page is the hub for the whole series: a linked index of all 36 parts, the recurring mistakes, a cheat sheet, and a scenario-based verdict.
Thirty-five posts ago this series opened with one claim: VMware Cloud Foundation 9 is not vSphere with extra parts bolted on, it is a different operating model for the private cloud. After walking through planning, bring-up, brownfield conversion, migration, day-2 operations, security, and the AI stack, that claim has held. This final part is the one I wish existed at the start. It pulls the whole journey together: the single idea that ties it, the handful of decisions that actually decide whether a VCF 9 program succeeds or stalls, a guided index back into every part of the series, the mistakes that kept reappearing across real engagements, and an honest read on where the platform is still rough in mid-2026. Read it top to bottom as a capstone, or jump to the index and use it as a map.
The through-line: one platform, one lifecycle, one API
If you read only one idea out of 36 posts, read this one. Older VCF releases still felt like a bundle: you ran vSphere, vSAN, and NSX as separate products, each with its own version, its own patch cadence, and its own console, and SDDC Manager tried to keep the set roughly in step. VCF 9 throws that model out. It organizes everything as a fleet of instances, each instance built from a management domain and one or more workload domains, and it puts a single shared lifecycle over the whole thing. The components stop being products you assemble and become layers of one platform that move together.
VCF 9.1, which reached general availability in May 2026, pushed the idea harder with an API-first consumption layer. The fragmented automation that used to live in separate SDDC Manager, vCenter, NSX, and vSAN endpoints now sits behind a single set of API contracts. That matters less because it is new and more because of what it forces. The platform assumes you will drive it as one system: declare the state you want, let the lifecycle reconcile it, and stop hand-patching individual components out of band. The teams that struggled this year were almost always the ones that deployed VCF 9 and then operated it like vSphere 7 with a new logo, clicking through four consoles and wondering why the lifecycle kept flagging drift.
So the through-line is not a feature. It is a posture. Commit to the unified model and the platform rewards you with coordinated upgrades, consistent state, and a real operations plane. Resist it, keep your old per-component habits, and you inherit all the complexity of the bundle with none of the payoff. Every part of this series was, in some form, a consequence of taking that one idea seriously.
The five decisions that decide the outcome
Across every engagement pattern this series covered, the same five choices separated the smooth programs from the ones that bled time and budget. None of them are about which button to click. They are commitments you make before the first host powers on, and each one is hard to reverse later. For each, here is why it matters, when the obvious answer is wrong, and what to validate before you lock it in.
1. The adoption path
Converge an existing vSphere estate into VCF, import brownfield vSAN and NSX, or build greenfield. This is the single highest-impact call in the program because it sets the constraints for everything downstream. Greenfield is cleanest and lets you design to the reference architecture, but it needs new hardware and a migration plan for the workloads. Converge is attractive when you have a healthy vSphere estate you want to keep, yet it carries forward whatever design debt that estate already had. Import looks cheap and is the one that bites hardest, because brownfield vSAN and NSX have to meet specific prerequisites before VCF will adopt them. Validate the current versions, the network topology, and the storage policy posture against the import requirements first. Get this wrong and you carry the mismatch as technical debt for years.
2. Management-domain sizing
The most common failure I see is an under-sized management domain. Teams size it for the handful of appliances they stand up on day one, then watch the headroom evaporate the moment VCF Operations, VCF Automation, and the AI services land on it. The management domain is not a small administrative cluster anymore, it is where the platform’s brain runs, and that brain grows. Size for what the platform becomes over the next two years, not the four appliances you start with. Validate the resource reservations for every management component you plan to run, add the ones you will add later, and leave failure-domain headroom on top. It is far cheaper to provision one extra host now than to restack a live management domain.
3. Network design
Bring-up does not usually fail on storage or compute. It fails on a VLAN that was never trunked, an MTU mismatch on the overlay, or a BGP peering that nobody validated end to end. Network design is the least glamorous decision and the one that pays back the most careful preparation. VCF 9 has firm expectations about your underlay, your uplinks, and your routing, and the installer will stop hard when reality does not match the workbook. Validate MTU consistency across every hop, confirm the uplink design (and whether you are committing to LACP), and test routing before bring-up rather than during it. The 9.1 installer is more forgiving of complex topologies, but it cannot fix a fabric that was never built to spec.
4. The licensing model
Core counting and the vSAN capacity entitlement decide your run-rate for the life of the contract, which makes licensing a design input rather than a procurement afterthought. The subscription model counts cores, with a per-CPU minimum, and bundles components you may or may not fully use. The shape of your hardware (how many sockets, how many cores per socket, how dense the hosts are) changes the bill more than almost any other choice. Validate the core math against your actual processor SKUs, model the vSAN capacity you are entitled to versus what you will consume, and do it before you finalize the hardware. The cheapest licensing optimization is choosing the right host before you buy it.
5. The operations model
Adopting VCF Operations as the single pane, rather than bolting old tooling onto a new platform, is what makes the unified lifecycle actually deliver. This is a decision because it is tempting to skip. The team knows its existing monitoring and runbooks, the new operations plane is unfamiliar, and the pressure on day one is to get workloads running. But if you keep operating each component through its own console and its own scripts, you keep the silos VCF 9 was built to remove, and the platform’s coordinated lifecycle becomes a thing you fight rather than use. Commit to VCF Operations from the start, retire the overlapping tools deliberately, and validate that your alerting, capacity, and cost workflows map onto it before go-live.
The whole series, in one map
Here is the entire journey as an annotated index. Each entry is the one-line lesson that part exists to teach, with a link to the full article. Use it to find the deep dive you need, or to send a colleague straight to the right place.
Plan and design (Parts 1 to 7)
- Part 1, VCF 9 Explained: what the unified private cloud platform actually is, and why it is a model change, not a version bump.
- Part 2, Architecture: fleet, instances, and domains, and how they fit together.
- Part 3, Licensing: core counting, the vSAN capacity entitlement, and the costly mistakes to avoid.
- Part 4, Planning and prerequisites: the readiness checklist that prevents a stalled bring-up.
- Part 5, Network design: the seven mistakes that break a deployment.
- Part 6, Storage design: vSAN ESA versus OSA and when to choose which.
- Part 7, Reference architecture: sizing, topology, and the design trade-offs that matter.
Build (Parts 8 to 11)
- Part 8, Management domain bring-up: standing up the platform with the VCF Installer.
- Part 9, Workload domain deploy: adding a VI workload domain with VCF Operations.
- Part 10, NSX in VCF 9: transit gateways, VPCs, and the distributed firewall.
- Part 11, Load balancing: Avi versus NSX native load balancing, and which to choose.
Adopt and migrate (Parts 12 to 18)
- Part 12, Adoption paths: when to converge, import, or start fresh.
- Part 13, Converge vSphere: converting an existing estate with the converge workflow.
- Part 14, Brownfield import: the requirements to bring existing vSAN and NSX under VCF.
- Part 15, Migrate older VCF: the seven things that bite during the upgrade.
- Part 16, Workload migration: moving workloads in with HCX and vMotion.
- Part 17, VCF-to-VCF migration: the parallel-instance reference architecture.
- Part 18, Cutover and decommission: the teardown failures that actually bite.
Operate (Parts 19 to 23)
- Part 19, Monitoring and observability: VCF Operations as the day-2 nerve center.
- Part 20, Capacity and cost: a practical runbook for keeping spend honest.
- Part 21, Fleet lifecycle: the reference architecture for coordinated upgrades.
- Part 22, Certs, identity, backup: the failures here and how to fix them.
- Part 23, Upgrades and patching: the full-stack runbook to reach 9.1.
Deep dives (Parts 24 to 35)
- Part 24, vSphere Supervisor and VKS: the reference design for modern apps.
- Part 25, VCF Automation: provider, organizations, and self-service cloud.
- Part 26, Private AI Foundation with NVIDIA: deploying the AI stack on VCF 9.
- Part 27, Zero trust security: identity, segmentation, and platform integrity.
- Part 28, Stretched cluster vs site recovery: choosing your DR architecture.
- Part 29, Hardware compatibility traps: the HCL gaps that stall bring-up.
- Part 30, Stretched and multi-site: topology, witness, and the latency budget.
- Part 31, Multi-instance fleet: one operations plane across many instances.
- Part 32, Automation and the API-first stack: a practical runbook.
- Part 33, Hybrid cloud: extending the private cloud to the hyperscalers.
- Part 34, Troubleshooting: the stuck workflows, locks, and log trails that bite.
- Part 35, Performance vs cost: where to spend your tuning effort.
The mistakes that showed up again and again
Patterns repeat. Across the planning, migration, and day-2 posts, the same handful of mistakes kept surfacing in different costumes. If you internalize nothing else operationally, internalize these.
Treating VCF 9 as a hypervisor upgrade. This is the root cause behind most of the others. Teams budget for an ESXi refresh and discover a platform that wants a new operating model, new skills, and a real operations practice. The fix is to scope it as a program with a design phase, not a maintenance window.
Skipping the bill-of-materials and interoperability check. Almost every stalled bring-up and failed upgrade traced back to a component, driver, or firmware level that was never validated against the target BOM. The platform is opinionated about supported combinations. Run the prechecks, confirm the HCL, and verify interoperability before you touch production.
Under-provisioning the management domain. It appears in the planning posts and again in the operations posts, because the symptom shows up late. Size for the platform you will be running in two years, including Operations, Automation, and any AI services.
Designing the network last. Storage and compute are forgiving; the fabric is not. MTU, VLANs, and routing should be designed and tested first, because they are the most common reason bring-up stops dead.
Leaving the licensing math until procurement. By then the hardware is chosen and the core count is fixed. The run-rate is a design output, and the time to influence it is while you are still picking sockets and host density.
Keeping the old tooling. Running VCF 9 through legacy per-component consoles preserves exactly the silos the platform exists to remove, and quietly defeats the coordinated lifecycle you paid for.
What VCF 9 gets right
Credit where it is due. The 9.1 release closed several of the gaps that made 9.0 feel like an early platform. The redesigned installer handles complex Day 0 topologies that previously needed manual workarounds, including IPv4/IPv6 dual stack and LACP-based uplink designs, with fewer inputs than the old bring-up flow. That alone removes a class of network-design pain the reference architecture in Part 7 had to design around.
Two more land as real operational wins. Enhanced NVMe memory tiering offloads roughly 20 to 25 percent of memory accesses to a high-performance NVMe tier, which raises VM density and lowers per-host cost without a visible hit to application responsiveness. And vCenter quick patch now applies only the RPMs that actually changed in a patch payload, which cuts vCenter patching to minimal and sometimes zero downtime. For a platform whose biggest day-2 complaint has always been lifecycle pain, that is a meaningful change, not a slide-deck feature.
The deeper win is the API-first consumption layer. When provisioning, lifecycle, and operations all speak one set of contracts, automation stops being a pile of brittle per-component scripts and becomes something you can actually maintain. Combined with the platform’s positioning as a home for enterprise AI, where the Private AI Foundation and NVIDIA stack run as a first-class workload rather than a bolt-on, VCF 9.1 is a coherent product in a way 9.0 only promised to be.
Where it still bites
I will not pretend the year was smooth. The honest friction is mostly commercial, and it is real. The subscription model replaced perpetual licensing with a per-core minimum of 16 cores per CPU, which makes small hosts uneconomic on a price-performance basis: you pay for cores a socket may not even have. Benchmarks across renewals through 2024 and 2025 showed annual costs rising sharply, with many teams reporting 3x to 6x increases and the removal of smaller SKUs forcing NSX and vSAN into a bundle some buyers do not fully use. If your estate is a handful of dense hosts, VCF 9 is a strong fit. If it is many small clusters at the edge, run the math hard before you commit, because the licensing floor can dominate the business case.
The other friction is conceptual weight. The full stack assumes you want VCF Operations, Automation, VKS, and the AI services as one platform. For an organization that genuinely needs a private cloud, that is the point. For one that wanted a hypervisor refresh, it is a lot of platform to stand up and operate, and pretending otherwise sets the wrong expectation with the team that has to run it. The skills gap is real too: the operating model rewards automation and platform thinking, and a team that has only ever clicked through vCenter will need time and training to catch up.
The VCF 9 cheat sheet
One screen to keep next to a design session. Each row is the rule of thumb the series landed on, plus the thing that most often catches teams out. Treat the rules of thumb as defaults to validate against your own bill of materials, not as gospel.
| Area | Rule of thumb | Watch out for |
|---|---|---|
| Licensing | Subscription, per core, with a 16-core-per-CPU minimum. Model the run-rate first. | Small and edge hosts are uneconomic; you pay for cores a socket may not have. |
| Adoption path | Pick converge, import, or greenfield before anything else. | The wrong path becomes technical debt you carry for years. |
| Management domain | Size for the full platform, not the four appliances you start with. | Headroom vanishes when Operations, Automation, and AI services land. |
| Network | Validate VLANs, MTU, and BGP peering before bring-up. | Most bring-up failures are network, not storage or compute. |
| Storage | vSAN ESA is the default on supported hardware. | Confirm the HCL; fall back to OSA only where ESA prerequisites are not met. |
| Operations | Adopt VCF Operations as the single pane from day one. | Bolting on old tooling keeps the silos the platform was built to remove. |
| Upgrades | Move 9.0 to 9.1 as a full-stack, sequenced operation; use vCenter quick patch for interim fixes. | Check BOM interoperability and run prechecks before every step. |
| AI stack | Run Private AI Foundation with NVIDIA on a dedicated workload domain. | GPU and vGPU sizing plus driver and BOM alignment decide whether it works. |
What I’d Do
Here is the verdict after 36 parts, broken down by the situation you are actually in, because the right answer is not the same for everyone.
Large enterprise with a private-cloud mandate. Adopt it, and adopt it fully. VCF 9 in its 9.1 form is the strongest version VMware has shipped, and for a dense estate that needs self-service, multi-tenancy, Kubernetes, and AI on one platform, it is the right place to standardize. Lock the adoption path and the licensing math first, size the management domain for what the platform becomes, and commit to VCF Operations from day one.
Mid-size estate weighing the move. The platform fits, but the business case is tighter, so do the licensing model before anything else and be ruthless about host density. Phase the adoption: stand up the management domain and one workload domain, prove the operating model on a real workload, then expand. Do not try to light up every capability at once.
Many small clusters or edge sites. Be honest. If you only wanted vSphere, do not let the platform narrative push you into a footprint the licensing floor will punish. Model the cost against your real core counts, look hard at whether vSphere Foundation or a partner-hosted option fits better, and only commit to full VCF where the private-cloud capabilities genuinely earn their keep.
AI-driven build. If the AI stack is the reason you are here, design around it from the start: dedicate a workload domain, get the GPU, vGPU, and driver alignment right early, and treat the Private AI Foundation with NVIDIA reference design as a hard requirement rather than a later add-on. The platform is genuinely good at this, but only if the hardware and BOM are planned for it up front.
That closes the series. If you have read along from Part 1, thank you for staying with it. Which part of your own VCF 9 journey turned out hardest in practice, the migration or the day-2 operating model? Tell me in the comments, because that is where the next round of writing should go.
References
- Announcing VCF 9.1: Modern Private Cloud Built for Efficiency and Resilience (VMware Cloud Foundation Blog)
- VMware Cloud Foundation 9.1 Release Notes (Broadcom TechDocs)
- VCF 9.1 Licensing: Programmatic, Centralized, and Built to Scale (VMware Cloud Foundation Blog)
- What’s New with vSphere in VMware Cloud Foundation 9.1 (VMware Cloud Foundation Blog)



