Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

VCF Operations in VCF 9: Monitoring and Observability Explained (VCF 9 Series, Part 19)

A field guide to VCF Operations in VCF 9: what replaced the Aria suite, the architecture you actually deploy, and the observability gotchas that bite during the move.

VCF 9 Series · Part 19 of 36

If you still picture monitoring as a separate Aria Operations stack you stand up after bring-up, VCF 9 will catch you out. The thing you used to bolt on is now the management plane itself. VCF Operations is where you build workload domains, hold your licenses, rotate certificates, and watch the fleet, and observability is one tab inside that, not a product you buy and wire in later. That reframing changes how you size it, where you place it, and what you decommission. This post walks through what VCF Operations actually is in 9, the topology you really deploy, and the considerations that come up when teams treat it like the old vRealize suite with a new badge.

What VCF Operations actually is now

VCF Operations is the product formerly known as VMware Aria Operations, but the rename undersells the change. Broadcom collapsed the old vRealize and Aria operations tooling into a single console that owns eight functional areas: fleet management, operations management, workload monitoring and observability, performance monitoring, FinOps and capacity, workload mobility, security and compliance, and the lifecycle and build workflows that used to live in SDDC Manager. You provision a VI workload domain from here. You manage licenses, identity, certificates, passwords, and tags from here. Observability is the part most admins came for, but it sits inside a much larger management surface.

One practical detail worth knowing on day zero: a fresh VCF Operations instance runs in evaluation mode for 90 days. To license it you either register with Broadcom and add a VCF or vSphere Foundation license, or attach an already-licensed vCenter. Plan that before the clock starts, because an unlicensed Operations instance is the management plane for everything else you just deployed. For the bigger picture of how Operations fits the fleet and instance model, see the VCF 9 architecture breakdown earlier in this series.

VCF Operations: eight functional areasObservability is one job; it is the operations plane for the whole fleetFleetmanagementOperationsmanagementWorkloadobservabilityPerformancemonitoringFinOpsand capacityWorkloadmobilitySecurityand complianceLifecycleand build
VCF Operations absorbed the old vRealize/Aria tooling and the SDDC Manager lifecycle workflows.

The architecture you actually deploy

The marketing line is “single, unified interface.” The interface is unified. The deployment is not a single VM, and assuming it is causes sizing pain. At minimum you run the VCF Operations Analytics cluster (primary, replica, and data nodes, fronted by a load balancer when you scale out for multinode). You add Cloud Proxies as collection gateways, typically one per physical data center, which gather metrics and logs locally and push them to the cluster. The integrated log solution adds its own footprint, and if you want network flow visibility you deploy a separate VCF Operations for Networks appliance and attach it. Four moving parts, not one.

VCF Operations: what you actually deploy One console, several appliances. Place the Analytics cluster and logs in the management domain. Analytics Cluster Primary + Replica + Data nodes Load balancer fronts multinode Cloud Proxy Site A: metrics + logs Cloud Proxy Site B: metrics + logs Logs RFC 5424, 6TB/node Ops for Networks optional, for flows Data sources: vCenter, ESX hosts, NSX, vSAN, Supervisor and VKS clusters, guest VMs via Telegraf. vCenter linking groups up to 15 instances for a single-pane inventory and alarm view.
VCF Operations is one console over several appliances. Size the Analytics cluster and log nodes deliberately.

Placement matters. Broadcom’s guidance, which matches what holds up in production, is to deploy the log solution in the same management domain as the Analytics cluster, use FQDNs rather than VIPs when you point log sources at it, and install a CA-signed certificate so sources trust the collector without manual steps. Get the topology on paper during design, the same way you would for the rest of the VCF 9 platform, rather than discovering the node count after the cluster is undersized.

Four layers of infrastructure observability

Inside the console, infrastructure observability splits into four areas that used to be separate tools.

VCF Health and Diagnostics is the proactive layer. VCF Health continuously watches component state and flags the operational rot that causes outages: expired certificates, NTP drift, DNS misconfiguration, configuration errors. Diagnostics detects known issues and VMSA security exposures and gives you remediation steps. If you ran Skyline, note that this is where it went. Diagnostics Findings now carries parity with the old Skyline Health Diagnostics and Skyline Advisor signatures, and Log Assist replaces the manual support-bundle dance when you open a case with Broadcom.

Logs is the area with the biggest change in 9.0. VCF Operations now has an integrated log solution, so you explore logs, build log-based alerts and symptoms, and create log dashboards directly in the console instead of pivoting to a separate product. Logs from VCF Operations, VCF Operations for Networks, and the VCF Identity Broker are standardized to RFC 5424 format, the log nodes scale to 6TB of storage each, and a unified Cloud Proxy now handles log collection.

Network Operations gives you NSX and vSphere networking inventory, alert trends, and NSX health, with the deeper capabilities (VPC monitoring, flow-based application discovery, 24-hour traffic summaries) gated behind the separate VCF Operations for Networks appliance. A useful detail in 9.0: flow data can now be collected through the NSX switch IPFIX mechanism even when the distributed firewall is deactivated or unlicensed, and edge transport node metrics are collected at 20-second granularity. Storage Operations rounds it out with a single-pane vSAN view, space-efficiency and deduplication dashboards for both ESA and OSA, and vSAN Performance Diagnostics that runs benchmark-driven tests for throughput, IOPS, or latency targets.

Four layers of infrastructure observabilityFour former tools, now areas inside one consoleVCF Health and DiagnosticsProactive: certs, NTP, DNS, VMSA; Skyline parityLogsIntegrated, RFC 5424, 6 TB/node, in-console alertsNetwork OperationsNSX/vSphere health; flows need Ops for NetworksStorage OperationsSingle-pane vSAN, dedup, perf diagnostics
Health, Logs, Network and Storage operations all live in the same inventory and login.

Workload and application observability

Above the infrastructure, VCF Operations watches the workloads. It collects guest OS and application metrics through the managed Telegraf agent and discovers running services with the Service Discovery adapter. The more interesting addition in 9 is native, automatic monitoring of modern apps: vSphere Supervisor and VKS clusters are discovered and monitored without bolting on a separate Kubernetes tool, with Telegraf feeding cluster, node, and Kubernetes object metrics straight into Operations. There is also vGPU monitoring for VMs, which matters if you run Private AI Foundation on the same estate. If you stood up a workload domain earlier in this series, the VI workload domain deployment walkthrough already had you in this same console.


Key considerations during the move

This is where field experience diverges from the datasheet. A few things consistently trip up teams migrating off the Aria suite.

  • Cloud Proxy log forwarding breaks on upgrade. From 9.0 the unified Cloud Proxy owns log collection, and Cloud Proxies that were previously enabled with the old Log Forwarding feature stop working after the upgrade. Plan a reconfiguration window. Do not assume log flow survives the jump.
  • The integrated log solution is not full parity yet. The in-console logs experience covers exploration, alerts, and dashboards, but for legacy log functions you still use the standalone VCF Operations for Logs UI. Do not decommission the standalone appliance assuming the embedded experience replaces all of it.
  • Network Operations looks half-empty out of the box. NSX flow visibility, VPC dashboards, and application discovery need the separate Operations for Networks appliance deployed and attached. Teams routinely promise flow-based dependency mapping in a design, then find the core console only shows inventory and health until that appliance lands.
  • Skyline runbooks are stale. If your operational runbooks still send people to Skyline Advisor or Skyline Health Diagnostics, repoint them at Diagnostics Findings. The signatures moved, the entry points did not survive.

My take

VCF Operations in 9 is a real improvement over the loosely-coupled Aria suite it replaces, mostly because licensing, identity, lifecycle, and observability finally share one inventory and one login. But the “single appliance” framing is the part to push back on with stakeholders. You are still designing an Analytics cluster, Cloud Proxies, a logs footprint, and optionally a networks appliance. Treat it as a topology decision during planning, not a checkbox during bring-up, and most of the pain above never shows up.

The Bottom Line

VCF Operations is no longer a monitoring tool you add to VCF. It is the operations plane VCF is built on, and observability is one of its jobs. Design the cluster, proxies, and log nodes up front, decide early whether you need the Networks appliance, and update anything that still references the Aria or Skyline names. Do that and Part 19 of the platform takes care of itself. How are you handling the cutover from a standalone Aria Operations deployment, lift the data or start clean? That decision drives the rest of your operations design.

References

VCF 9 Series · Part 19 of 36
« Previous: Part 18  |  VCF 9 Complete Guide  |  Next: Part 20 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading