Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF 9 Multi-Instance Fleet Management Explained: One Operations Plane, Many Instances (VCF 9 Series, Part 31)

What a VCF 9 fleet really is, how one VCF Operations and VCF Automation plane governs many VCF instances, how to size it by objects and metrics rather than host counts, and when to split into more than one fleet.

VCF 9 Series · Part 31 of 36

A VCF fleet is not a bigger SDDC Manager, and treating it like one is the quickest way to mis-design a VCF 9 estate. In VCF 9 the fleet is a real management plane that sits above your instances: a single deployment of VCF Operations and VCF Automation that governs identity, certificates, licensing, lifecycle, configuration drift and tags across every instance underneath it. If you came from VCF 4.x or 5.x, where each SDDC Manager was its own island, this is the structural change that matters most. Get the mental model right and the rest of your VCF 9 design falls into place. Get it wrong and you will either build one fleet that is too big to fail safely, or a dozen tiny ones that defeat the point.

Fleet, instance, domain: getting the three terms straight

The vocabulary trips people up, so pin it down before anything else. A VCF instance is one self-contained VCF deployment: exactly one management domain, plus zero or more VI workload domains. The management domain carries the core SDDC components (vCenter Server, ESXi hosts, vSAN or other supported principal storage, NSX Manager) and adds a single SDDC Manager and a VCF Operations collector. A VCF fleet is a collection of one or more of those instances, managed centrally by one instance of VCF Operations and one instance of VCF Automation. A domain is the unit inside an instance: management or workload.

So the nesting is fleet contains instances, instance contains domains, domain contains clusters and hosts. The fleet components are deployed once and shared; the SDDC Manager and Operations collector are per instance. If you want the full architectural breakdown of how these pieces interlock, that is covered in VMware Cloud Foundation 9 Architecture: Fleet, Instances and Domains Explained.

VCF Fleet: one management plane, many instances A single VCF Operations and VCF Automation deployment governs every instance below it Fleet Management Plane VCF Operations (+ Fleet Manager) VCF Automation VCF Instance A Mgmt Domain vCenter, NSX, SDDC Mgr, Ops Collector Workload Domains vSphere + vSAN + NSX VCF Instance B Mgmt Domain vCenter, NSX, SDDC Mgr, Ops Collector Workload Domains vSphere + vSAN + NSX VCF Instance C Mgmt Domain vCenter, NSX, SDDC Mgr, Ops Collector Workload Domains vSphere + vSAN + NSX SDDC Manager and the Operations collector stay per-instance; everything above the arrows is shared once.
A VCF 9 fleet: shared fleet-management components on top, per-instance SDDC Manager and collector below.

What the fleet plane actually centralizes

VCF Operations is the console for the fleet, and fleet management is the set of things it does once on behalf of every instance. The list is longer than most people expect, and each item used to be a per-instance chore.

  • Identity and access: single-source SSO configuration applied across components and geolocations, so you stop re-creating identity sources on every vCenter.
  • Certificate management: unified, non-disruptive TLS handling with auto-renewal for VMCA, MSCA and OpenSSL CAs, plus import of externally signed certificates.
  • Licensing: one license file per VCF Operations instance, with connected and disconnected modes for air-gapped sites.
  • Lifecycle: Fleet Manager, integrated into VCF Operations, downloads bundles, runs prechecks and orchestrates upgrades for the platform.
  • Configuration management: scheduled drift detection for vCenter and cluster objects, Git-backed template versioning, and vSphere Configuration Profile status across activated clusters.
  • Tag and password management: push tags to many vCenters at once, resolve tag conflicts, and get a consolidated view of account password expiry across the fleet.

The point is consistency at scale. The same identity source, the same certificate policy, the same desired configuration, enforced once and observed everywhere. The deep treatment of how lifecycle specifically flows through Fleet Manager lives in VCF 9 Fleet Lifecycle Management: The Reference Architecture, so I will not repeat it here.


What the fleet plane centralizesEach item used to be a per-instance chore; now it is done onceIdentity and accessCertificate managementLicensingLifecycle (Fleet Manager)Configuration and driftTags and passwords
Same identity, certificates and desired config, enforced once and observed everywhere.

Sizing a fleet: it is objects and metrics, not host counts

Here is where careful architects still go wrong. There is no fixed maximum number of VCF instances per fleet. Broadcom does not publish a clean number like "eight instances per fleet" because that is not how the limit works. VCF Operations is sized by objects and metrics, the same currency the old Aria Operations used, not by the count of vCenters, clusters or hosts you are used to from the configuration maximums page.

Concrete numbers help. With the Extra Large deployment size of VCF Operations, a single node supports up to roughly 100,000 objects and 20 million metrics, and a 16-node cluster scales to around 1 million objects and 126 million metrics. Whether a given fleet lands at five instances or fifty depends on how many objects each instance contributes, which is a function of cluster count, host count, VM density, the adapters you enable and how aggressively you collect. Two fleets with the same instance count can have wildly different object totals.

My take

Size the fleet from a real object and metric estimate, then leave headroom, because object counts grow every time someone stands up a new workload domain or turns on another management pack. If you anchor your design on "how many instances" you will eventually wall yourself into a VCF Operations cluster that needs an unplanned scale-out. Check the live figures on the VCF Operations configuration maximums page before you commit, since these values move between releases.

When one fleet is the wrong answer

The instinct after reading the marketing is to put everything in a single fleet for that one-pane-of-glass feeling. Resist it. A fleet is a shared management plane, and a shared plane is a shared blast radius. The cleaner design splits fleets along boundaries that already exist in your organization.

Latency and bandwidth are the first hard boundary. The fleet components talk constantly to every instance they manage, so stretching one fleet across sites with poor or expensive interconnects degrades collection and lifecycle operations. Availability is the second: if the single VCF Operations and VCF Automation pair is your only management plane, an outage or a botched upgrade of that plane affects every instance at once, which is exactly the scenario your DR design for the management estate has to account for. Regulatory and data-sovereignty separation is the third. If one set of instances is in scope for a compliance regime and another is not, a hard fleet boundary is far easier to defend in an audit than RBAC carve-outs inside one fleet.

So the rule I give clients is simple: do not split a fleet because you fear an instance-count ceiling, split it because a failure domain, a network boundary or a compliance line says you should. Multi-site designs in particular force this question early, and the trade-offs overlap heavily with the topology decisions in VCF 9 Stretched Clusters and Multi-Site Design.

When one fleet is the wrong answerA shared management plane is a shared blast radiusLatency / bandwidthThe fleet talks to every instance constantlyAvailabilityShared plane = shared blast radiusComplianceData-sovereignty / audit boundarySplit on a failure domain, network boundary or compliance line, not on fear of an instance-count ceiling.
There is no fixed instance ceiling; split fleets where a real boundary already exists.

SDDC Manager is still here, but plan for its exit

SDDC Manager has not disappeared. Every VCF 9 instance still installs one, and it still owns lifecycle for the core SDDC components inside that instance. What changed is its status: VCF Operations is now the central console, and the SDDC Manager user interface is deprecated and slated for removal in a future release. Broadcom has been explicit that the management plane is converging on VCF Operations.

The practical implication for anyone building automation: do not write new tooling, runbooks or training against the SDDC Manager UI. Point operators at VCF Operations now, and where you need programmatic control, target the fleet APIs rather than per-instance SDDC Manager workflows you will have to retire. Treating SDDC Manager as a transitional component today saves a painful rework later.

What I’d Do

Design the fleet boundary first, before you ever open the installer. Start with how many management planes your failure, latency and compliance domains actually require, then size each VCF Operations deployment from a real object and metric estimate with headroom for growth. Standardize identity, certificates and configuration at the fleet level from day one so you never inherit per-instance drift, and treat SDDC Manager as a component on its way out rather than a console to build on. Done that way, the fleet earns its single-pane promise instead of becoming a single point of failure.

How many fleets are you planning for your estate, and is that number coming from a real failure-domain analysis or just from instinct? That is the conversation worth having before bring-up.

References

VCF 9 Series · Part 31 of 36
« Previous: Part 30  |  VCF 9 Complete Guide  |  Next: Part 32 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading