Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF 9 Reference Architecture: Sizing, Topology and Design Trade-offs (VCF 9 Series, Part 7)

How to lay out VMware Cloud Foundation 9 for production: fleet and instance topology, management and workload domain sizing, appliance models, and the design trade-offs that matter.

VCF 9 Series · Part 7 of 36

TL;DR · Key Takeaways

  • VCF 9 reorganises the platform into a three-tier model: Fleet (the management envelope), Instances (a vCenter plus its domains), and Domains (management or workload).
  • A new management domain needs a minimum of 4 hosts on vSAN, NFS, or VMFS on FC, with vSAN ready nodes recommended.
  • Pick an appliance model up front: Simple deploys about 7 appliances, High Availability deploys about 13 (3x NSX, 3x VCF Operations, 3x VCF Automation) and is the production choice.
  • VCF Operations is now the single console for the whole fleet, and SDDC Manager is on a path to deprecation.
  • Design the network and storage layers first: they constrain everything the reference architecture sits on top of.
Who this is for: Cloud architects and senior VI admins sizing a production VCF 9 platform.  Prerequisites: Familiarity with vSphere, vSAN, and NSX concepts, plus the VCF 9 architecture basics from earlier parts of this series.

Knowing what VMware Cloud Foundation 9 is, and knowing how to lay it out so it survives three years of growth, are two different skills. This part of the series moves from concepts to a reference architecture: the topology, the host and appliance counts, and the trade-offs you weigh before the first ESX host is ever racked. The goal is a design that scales predictably, stays supportable, and does not paint you into a corner on day 400.

Start with the building blocks: Fleet, Instance, Domain

VCF 9 introduced new structural constructs, and your reference architecture has to be expressed in their terms. A VCF Fleet is the outer envelope: it is the set of VCF deployments managed together by a common VCF Operations and VCF Automation instance. Inside the fleet sit one or more VCF Instances, each of which is a vCenter Server with its associated NSX, plus the domains it owns. Within an instance you have Domains, which are either the management domain or one or more workload domains. If you internalised the older Consolidated versus Standard labels, let them go: VCF 9 retires that terminology. You now choose, per domain, whether management and workload share a vSphere cluster or run on dedicated clusters. For the reasoning behind these constructs, see our VCF 9 architecture breakdown.

Sizing the management domain

The management domain is the first thing deployed and the foundation everything else hangs off, so size it deliberately. A fresh VCF 9 deployment requires a minimum of 4 hosts in the management cluster, backed by vSAN, NFS, or VMFS on Fibre Channel. Broadcom recommends vSAN ready nodes for this tier, and vSAN Express Storage Architecture (ESA) is the modern default. Four hosts is the floor for availability and lifecycle headroom, not a target: it lets the cluster tolerate a host failure while still running maintenance and upgrades without starving the management appliances.

If you are converging an existing vCenter cluster rather than building new, the minimums shift: 3 vSAN ready nodes, or 2 hosts on external storage (NFS or VMFS on FC). Treat those as conversion minimums, not production design targets. For anything that will carry real fleet management load, plan the management domain at 4 hosts or more from the outset.

Appliance models: Simple versus High Availability

One of the biggest early decisions is the appliance model, because it sets your management footprint and your future patching effort. The VCF Installer (which replaces the old Cloud Builder appliance and the Deployment Parameters Workbook) offers two shapes. The Simple model deploys a minimum of about 7 appliances: single instances of vCenter Server, SDDC Manager, and NSX Manager, plus VCF Operations Manager, a Fleet Management appliance, a VCF Operations collector, and a VCF Automation appliance. The High Availability model, recommended for production, deploys a minimum of about 13 appliances, clustering the critical control planes into threes: 3x NSX Manager nodes, 3x VCF Operations nodes, and 3x VCF Automation nodes, with additional appliances for VCF Operations for Logs and VKS.

Those triple-node clusters are not only about surviving a hardware failure. They also let you patch and upgrade one node at a time, which keeps the management plane available during lifecycle operations. Choosing Simple to save resources on a platform that will later carry production workloads is a false economy: moving from Simple to HA after the fact is far more disruptive than provisioning HA on day one.

# Management appliance footprint (minimums)
Simple model        ~7 appliances   (lab / small, non-HA)
High Availability   ~13 appliances  (production)

# HA control-plane clustering
NSX Manager         3 nodes
VCF Operations      3 nodes
VCF Automation      3 nodes
VCF Ops for Logs    3 nodes  (+ VKS as required)

# Management cluster host floor
New deployment      4 hosts (vSAN / NFS / VMFS-on-FC)
Converge existing   3 vSAN ready nodes OR 2 hosts external storage

Workload domain topology and cluster models

With the management domain anchored, the reference architecture branches into workload domains. The core question is separation: do you keep workloads on dedicated workload domains, or do you let management and workload share a cluster? For most production estates, a dedicated management domain plus one or more dedicated workload domains is the cleaner answer. It isolates the management plane from noisy or untrusted workloads, gives you independent lifecycle control, and keeps capacity planning honest, because management overhead does not hide inside the workload pool.

Within each workload domain you then select a vSphere cluster model and a storage model that match the availability you need. VCF 9 also makes it straightforward to import existing vSphere, vSAN, and NSX clusters into an instance as workload domains, which lets a reference design absorb brownfield estates rather than forcing a rebuild. Plan domains around failure boundaries and lifecycle cadence, not just raw capacity: a domain is the unit you patch, scale, and reason about together.

The network and storage layers underneath

A reference architecture is only as solid as the network and storage design it rests on, and both the management domain and every workload domain are configured for NSX out of the box. Get the distributed switch model, uplink design, and VLAN or overlay layout right before you finalise host counts, because rework here is expensive once domains are live. The same applies to storage: your choice of vSAN ESA versus external storage, and your policy design, dictate usable capacity and performance ceilings. These two areas generate the most avoidable production incidents, so it is worth slowing down on them. We cover the common traps in detail in our VCF 9 network design guide and our vSAN ESA storage design walkthrough.

Disclaimer: Host and appliance minimums change between releases. Before you commit a design, validate the target Bill of Materials against the Broadcom Compatibility Guide and the VCF Configuration Maximums, confirm interoperability across vSphere, vSAN, and NSX versions, and size for failure and lifecycle headroom rather than the bare minimum. Always test a reference design in a non-production instance first.

Design trade-offs worth weighing

  • Shared versus dedicated clusters: sharing management and workload saves hardware in small sites, but couples their lifecycles and blast radius. Dedicate them once the platform matters.
  • Simple versus HA appliances: Simple is fine for a lab or a proof of concept; production wants the triple-node control planes for availability and non-disruptive patching.
  • Greenfield versus brownfield: the converge and import pathways let you reuse existing vCenter, vSAN, and NSX estates, trading a pristine layout for a faster, cheaper path to VCF.
  • Fleet centralisation: one VCF Operations and VCF Automation instance per fleet simplifies operations but makes that fleet management tier a dependency you must protect and scale carefully.

Final thoughts

A good VCF 9 reference architecture is mostly a set of deliberate choices made early: the fleet and instance topology, a management domain sized with headroom, an appliance model that matches your availability needs, and network and storage layers designed before host counts are locked. Get those right and the platform scales by addition rather than by rework. In the next parts of this series we move from design into bring-up, starting with standing up the management domain itself.

References


« Previous: Part 6, Storage Design  ·  Next: Part 8, Management Domain Bring-up (see the pillar guide) »

Up to the pillar: VCF 9 Complete Guide ↑

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading