Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

VCF 9 Automation and the API-First Stack: A Practical Runbook (VCF 9 Series, Part 32)

A practical runbook for automating VMware Cloud Foundation 9: set up OAuth 2.0 token authentication, pick between PowerCLI, the Unified SDK and Terraform, and wire it into CI/CD without leaking tokens.

VCF 9 Series · Part 32 of 36

TL;DR · Key Takeaways

  • VCF 9 is genuinely API-first: every SDK, PowerCLI cmdlet and Terraform resource is generated from one OpenAPI contract, so a call behaves the same in Python, Java and PowerShell.
  • Stop hardcoding passwords. Register an API client, mint a long-lived API token (valid up to 30 days), and exchange it for a short-lived bearer token (about 30 minutes) through VCF SSO.
  • Pick the tool by job: PowerCLI for day-2 admin work, the Unified SDK for app teams, Terraform for declarative Day-0 patterns, raw REST only for quick checks.
  • The 30-minute bearer token is the most common pipeline failure. Long migrations and big loops outlive it, so re-mint mid-run or pass the API token directly and let the tool refresh.
  • VCF 9.1 adds Prometheus-compatible real-time metrics APIs, a vCenter Group Federated API, and a SQL-like vCenter Query API worth wiring into your automation.
Who this is for: VCF architects, platform engineers and automation leads building repeatable provisioning and day-2 workflows.  Prerequisites: a running VCF 9.0 or 9.1 instance, VCF Operations access, and admin rights to register an API client. Working knowledge of PowerCLI or Terraform helps.

Every large VCF rebuild eventually hits the same wall. Someone has forty workload domains to stand up, a dozen NSX VPCs to wire, and a clickops runbook that takes a week of mouse miles to execute. Earlier VCF releases made this painful because the automation surface was a patchwork: SDDC Manager had its own REST API, vSphere had another, NSX had a third, and the PowerShell modules lagged behind all of them. VCF 9 is the first release where that stops being true. The API is no longer a thin wrapper bolted onto the UI. It is the contract everything else is generated from, and that single change is what makes serious automation worth the effort here.

This runbook walks the path I use on real engagements: get authentication right first, choose the right tool for each job, then provision and operate with code you can put in a pipeline. If you have already read the VCF Automation self-service walkthrough, this is the layer underneath it: the raw APIs and tooling that the self-service catalog itself consumes.

What “API-first” actually changed

The marketing line is “API-first private cloud.” The practical reality is more useful than that sounds. Broadcom adopted OpenAPI specifications as the single source of truth and now generates the SDKs from that spec. The effect is functional parity: a method you call in Python behaves the same in Java or PowerShell, which kills the feature-gap problem that used to make people pick a language and then discover it was missing half the operations they needed. The bindings ship through standard channels too, PyPI for Python, Maven Central for Java, the PowerShell Gallery for PowerCLI, so you are not chasing zip files off a download portal.

VCF 9.0 started this with OpenAPI specs and a Unified SDK covering vSphere, vSAN, the VCF Installer and SDDC Manager. VCF 9.1 finished the job by folding NSX, VCF Operations, Log Management, Operations for Networks, and Fleet and SDDC Lifecycle Management into the same SDK. So as of 9.1 you can drive the whole stack from one set of bindings instead of juggling four client libraries with four auth models.

Three new 9.1 APIs are worth knowing about before you build anything. The real-time metrics APIs are Prometheus-compatible and expose up to 2-second granularity across ESX, vCenter, vSAN and NSX, with native PromQL so they plug straight into Grafana. They replace the old vStats API and pair well with the patterns in the VCF Operations monitoring deep dive. The vCenter Group Federated API gives you one endpoint to query inventory across every vCenter in a group, which is what you want for fleet-wide reporting. And the vCenter Query API offers SQL-like, server-side filtering over inventory so you stop pulling full datasets just to find a handful of powered-off VMs.

Disclaimer: The steps below create and change real infrastructure. Validate against your target BOM and interoperability matrix, take backups and snapshots, run the relevant prechecks, and test every workflow in a lab or non-production instance before you point it at anything that matters.

Step 1: Set up OAuth 2.0 token authentication

Get this right and everything else follows. Get it wrong and you end up with a service-account password in a YAML file that survives three job changes. VCF 9 SSO supports an OAuth 2.0 flow built on the VMware Identity Broker (VIDB), and it has four moving parts.

VCF 9 OAuth 2.0 Token Flow From a one-time admin setup to a short-lived bearer token your script can use 1 Register client Admin creates an API client in VIDB 2 API token Long-lived token from VCF Ops (up to 30 days) 3 Bearer token VIDB exchanges it for a short-lived (~30 min) token 4 Authenticate Reuse one token across all VCF components Components the bearer token covers vSphere (Connect-VIServer) · VCF NSX (Connect-NsxServer) · VCF Operations (Connect-VcfOpsServer) VCF Automation · one bearer token is valid against every component, so you authenticate once.
The VCF 9 OAuth 2.0 flow: one admin setup, a long-lived API token, then a short-lived bearer token reused across the stack.
  1. Register an API client. A VCF administrator creates the client once; its credentials are held in the Identity Broker. This is the one-time setup.
  2. Generate the API token. Pull a long-lived API token from the VCF Operations UI. This token is valid for up to 30 days and replaces any hardcoded user password.
  3. Exchange for a bearer token. At runtime, the script hands the API token to VIDB, which validates it and returns a short-lived bearer access token (typically about 30 minutes).
  4. Authenticate. Use that bearer token against vSphere, NSX, VCF Operations and VCF Automation. The same token works across all of them.

In PowerCLI 9.1 the new VMware.Vcf.Sso module makes this two lines. New-VcfOAuthSecurityContext does the exchange, and the resulting context attaches to the connect cmdlets directly.

# Exchange the long-lived API token for a short-lived bearer context
$ctx = New-VcfOAuthSecurityContext `
  -IdentityBrokerHostname 'vc-mgmt-a.site-a.vcf.lab' `
  -ApiToken 'vidb_ODcxMTA1MjItMzljZC00Y****'

# Reuse the same context across components, no passwords in sight
Connect-VIServer  -Server vc-mgmt-a.site-a.vcf.lab  -VcfOAuthSecurityContext $ctx
Connect-NsxServer -Server nsx-mgmt-a.site-a.vcf.lab -VcfOAuthSecurityContext $ctx

For non-interactive runners there is a shorter path: skip the explicit exchange and pass the API token straight to the connect cmdlet with -VcfApiToken. PowerCLI auto-discovers the integrated SSO instance and handles the bearer token for you.

Connect-VIServer vc-mgmt-a.site-a.vcf.lab -VcfApiToken 'vidb_NzYwN2M5*****'

If you are scripting against SDDC Manager directly in another language, the classic token endpoint still exists and returns an access token (1 hour) plus a refresh token (24 hours). It is fine for SDDC Manager-scoped calls, but for anything cross-component the OAuth path above is the cleaner model.

# Legacy SDDC Manager token (scoped to SDDC Manager APIs)
curl -k -X POST https://sddc-manager.vcf.lab/v1/tokens 
  -H 'Content-Type: application/json' 
  -d '{"username":"administrator@vsphere.local","password":"REDACTED"}'
# -> returns { "accessToken": "...", "refreshToken": "..." }

Step 2: Pick the right tool for the job

Because everything is generated from the same spec, the choice is no longer about which tool has the feature you need. It is about which workflow fits the work. Raw REST through curl earns its place for one-off checks, webhooks, and languages without an official SDK, but you do not want to build a provisioning system out of shell scripts. The Unified SDK for Python and Java is the right call for application teams and complex orchestration where you need real error handling, retries, and to embed VCF operations inside a larger system.

PowerCLI 9.1 is where most VMware admins should live for day-2 work. It got a serious expansion this release: the VMware.VimAutomation.Vpc module now orchestrates VPCs end to end with cmdlets like New-VpcTransitGateway and New-VpcConnectivityPolicy, vSAN gained Get-VsanEffectiveCapacity and cross-vCenter remote datastore management, and CPU topology is now controllable per VM. For bulk changes and storage and networking operations, nothing beats it on an admin team.

Terraform fits a different shape of problem: declarative, repeatable, version-controlled infrastructure. The vSphere Terraform provider is now officially supported and published to the HashiCorp registry, with v2.16.0 adding a proper supervisor_v2 resource, vSphere Zones, project VPCs and EVC support. For greenfield Day-0 patterns, the VCF Terraform Toolkit packages prescriptive topologies as reusable modules so you are not writing bring-up plumbing from scratch.

Pick the tool by the workflow, not the featureAll four are generated from one OpenAPI spec, so they have functional paritycurl / raw RESTOne-off checks, webhooks, no SDKUnified SDK (Python/Java)App teams, complex orchestrationPowerCLI 9.1Admin day-2 and bulk changesTerraformDeclarative, version-controlled, Day-0
Lead with PowerCLI for day-2, the SDK for orchestration, Terraform for Day-0 in pull requests.

Step 3: Provision a Supervisor with Terraform

Here is the pattern I reach for when an application team owns provisioning and wants infrastructure reviewed in a pull request. Pin the provider, define the building blocks, and let plan and apply do the rest. The snippet below defines a VM class and the start of a Supervisor; the full resource takes management network, ingress, egress, pod and service CIDRs, which is exactly the kind of detail you want in code review rather than in a wizard nobody remembers running.

terraform {
  required_providers {
    vsphere = {
      source  = "vmware/vsphere"
      version = "~> 2.16.0"
    }
  }
}

resource "vsphere_virtual_machine_class" "vm_class" {
  name   = "custom-class"
  cpus   = 4
  memory = 4096
}

resource "vsphere_supervisor" "supervisor" {
  cluster        = "<compute_cluster_id>"
  storage_policy = "<storage_policy_name>"
  content_library = "<content_library_id>"
  sizing_hint    = "MEDIUM"
  # management_network, ingress_cidr, egress_cidr,
  # pod_cidr, service_cidr and namespace blocks follow
}

One honest caveat: do not try to Terraform your entire SDDC. The provider and toolkit are strong for vSphere-level resources and Day-0 patterns, but SDDC Manager bring-up and ongoing fleet lifecycle are better driven through the SDK or PowerCLI, which model those long-running, stateful operations properly. Mixing the two is fine. Pretending Terraform owns the whole platform is how state files end up fighting SDDC Manager. For the lifecycle side, see the VCF 9 fleet lifecycle reference architecture.


Step 4: Wire it into CI/CD without leaking tokens

The authentication model is only as safe as the place you keep the token. The 30-day API token is a real credential, so treat it like one. Store it in a secrets manager or vault, never in a git repository, and inject it into the runner at execution time. Let each pipeline run mint its own short-lived bearer token rather than caching long-lived material on disk. Because an administrator can revoke an API token at any time, you also get a clean kill switch if a runner is ever compromised.

# Pipeline step: pull the token from the vault, never from git
$apiToken = Get-VaultSecret -Path 'vcf/prod/api-token'
Connect-VIServer vc-mgmt-a.site-a.vcf.lab -VcfApiToken $apiToken

# ... run provisioning / day-2 tasks ...

Disconnect-VIServer -Confirm:$false

My take

, and the key consideration: the bearer token expires in about 30 minutes, and plenty of real automation runs longer than that. A migration loop over a few hundred VMs, a workload-domain build, a fleet-wide config sweep, all of these can outlive a single bearer token and fail halfway with a confusing 401. Either re-mint the token at sensible checkpoints in a long job, or use the direct -VcfApiToken method and let PowerCLI refresh under you. Do not write a tight retry loop that hammers the token endpoint when a call returns 401; the platform now validates inputs hard, and a flood of bad requests is exactly what its guardrails are built to reject.

Know your token lifetimesThe short bearer window is what trips long automationAPI tokenup to 30 dayslong-lived; store in a vault, never in gitBearer tokenabout 30 minutesshort-lived; reused across all componentsLegacy SDDC token1 hour (refresh 24h)scoped to SDDC Manager APIsLong jobs outlive the 30-minute bearer; re-mint at checkpoints or pass -VcfApiToken and let PowerCLI refresh.
Design every long-running job around the 30-minute bearer window.

What I’d Do

On a VMware-admin team, lead with PowerCLI 9.1 for day-2 and bulk work, and adopt the OAuth context model from day one so you never plant a password in a script. When an application team owns provisioning and wants infrastructure in pull requests, hand them Terraform and the VCF Terraform Toolkit, but keep SDDC bring-up and fleet lifecycle on the SDK or PowerCLI. Treat the API token like the production credential it is, and design every long-running job around the 30-minute bearer window instead of discovering it the hard way. Which of these are you automating first, provisioning or day-2 operations?

References

VCF 9 Series · Part 32 of 36
« Previous: Part 31  |  VCF 9 Complete Guide  |  Next: Part 33 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

VCF 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading