Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
, ,

VCF 9 API Authentication: Tokens, Refresh, Service Accounts and RBAC (Automating VCF Series, Part 3)

Authentication on VCF 9 is a subsystem, not a one-line step. Token pairs, the 60-minute expiry, refresh handling, service accounts and the ADMIN/OPERATOR/VIEWER roles.

Automating VCF Series · Part 3 of 30

TL;DR · Key Takeaways

  • POST /v1/tokens returns a pair: an access token (60-minute TTL) and a refresh token (24-hour TTL). The access token is the bearer; the refresh token renews it.
  • Any job longer than an hour must refresh. Build authentication as a small subsystem, not a one-time call at the top of a script.
  • Refresh preemptively on a timer, and also catch the 401 and refresh once before retrying. Belt and braces.
  • Never automate under a personal login. Create a dedicated service account through the Users API and scope it with RBAC.
  • Three roles: ADMIN (everything, including credential and user management), OPERATOR (most write operations), VIEWER (read-only GET). Default to the least that works.
Who this is for: platform and DevOps engineers and admins writing automation against VCF 9.  Prerequisites: a reachable SDDC Manager, an admin credential to create a service account, and the toolchain context from Part 2.

The 401 that kills your pipeline at minute 61 is not a bug. It is the design. An access token lives 60 minutes, a refresh token lives 24 hours, and any automation that runs longer than an hour has to plan for both. Authentication on VCF 9 is not a step you do once at the top of a script. It is a small subsystem you build. Getting it right is the difference between a job that finishes and a job that pages you halfway through.

Step 1: Get a token pair

You authenticate by POSTing credentials to /v1/tokens. The response is a pair, not a single token. Use a service account here, not your own login, for reasons we get to in Step 4.

# Step 1: request a token pair
curl -sk -X POST https://sddc-manager.lab.local/v1/tokens 
  -H 'Content-Type: application/json' 
  -d '{"username":"svc-automation@vsphere.local","password":"********"}'

# Response:
# {
#   "accessToken":  "eyJhbGciOiJSUzI1NiJ9.eyJ...",    access token, 60-min TTL
#   "refreshToken": "9b3c0f2a-7e1d-4a5b-8c6e-..."      refresh token, 24-hour TTL
# }
# Failure mode: wrong creds return 401 immediately; repeated failures can lock the account.

What you actually got

The access token is a short-lived JWT you attach to every request. The refresh token is a longer-lived credential whose only job is to mint new access tokens without re-sending the password. Treat the refresh token like a password: it is a 24-hour skeleton key, and it does not belong in a log file or a committed config.

Step 2: Use the access token

Every call carries the access token in the Authorization header as a bearer. That is the whole protocol. The two ways this breaks are both common: forgetting the literal Bearer prefix, and using a token that has already aged out, which returns a 401 with a token-expired message.

# Step 2: call an API with the bearer token
curl -sk https://sddc-manager.lab.local/v1/domains 
  -H "Authorization: Bearer $ACCESS"

# Step 3 (preview): renew with the refresh token before the access token dies
curl -sk -X PATCH https://sddc-manager.lab.local/v1/tokens/access-token/refresh 
  -H 'Content-Type: application/json' 
  -d '{"refreshToken":"9b3c0f2a-7e1d-4a5b-8c6e-..."}'
# -> { "accessToken": "" }

Step 3: Refresh before it dies, not after

The refresh token buys you 24 hours of renewals without ever re-sending the password. The pattern I ship has two layers. A timer refreshes the access token well inside the 60-minute window, so steady-state work never hits an expiry. A 401 handler catches the case where the timer was wrong, the clock skewed, or the job paused, refreshes once, and retries. If the refresh token itself has expired, fall back to a full re-auth.

Token lifecycle Refresh inside the window; never ride a token to zero refresh token validity: 24 hours access token valid: 60 min t = 0 issue ~45 min: refresh 60 min: expiry new 60-min token issued, repeat
Refresh at roughly 45 minutes and the access token never reaches zero during a run.

The refresh-on-401 pattern

Here is the small session wrapper I actually use in Python. It authenticates once, retries a single time on a 401 after refreshing, and re-authenticates only if the refresh token is also dead. Forty lines that turn the most common pipeline failure into a non-event.

import requests

BASE = "https://sddc-manager.lab.local"
requests.packages.urllib3.disable_warnings()

class VcfSession:
    def __init__(self, user, password):
        self.user, self.password = user, password
        self._auth()

    def _auth(self):
        r = requests.post(f"{BASE}/v1/tokens", verify=False,
            json={"username": self.user, "password": self.password})
        r.raise_for_status()
        t = r.json()
        self.access, self.refresh = t["accessToken"], t["refreshToken"]

    def _renew(self):
        try:
            r = requests.patch(f"{BASE}/v1/tokens/access-token/refresh",
                verify=False, json={"refreshToken": self.refresh})
            r.raise_for_status()
            self.access = r.json()["accessToken"]
        except requests.HTTPError:
            self._auth()          # refresh token also expired: start over

    def get(self, path):
        h = {"Authorization": f"Bearer {self.access}"}
        r = requests.get(f"{BASE}{path}", headers=h, verify=False)
        if r.status_code == 401:  # token died mid-run
            self._renew()
            h = {"Authorization": f"Bearer {self.access}"}
            r = requests.get(f"{BASE}{path}", headers=h, verify=False)
        r.raise_for_status()
        return r.json()

# usage
s = VcfSession("svc-automation@vsphere.local", "********")
print([d["name"] for d in s.get("/v1/domains")["elements"]])

Step 4: Stop using your own login

Automation that runs as a named human is a liability. When that person leaves, rotates their password, or has their account locked, every pipeline they touched dies with no warning. Create a dedicated service account through the Users API (POST /v1/users) and give it exactly the role it needs. SDDC Manager has three.

RoleWhat it can doUse it for
ADMINAll methods, including credential and password management and user managementAutomation that rotates secrets or manages users
OPERATORPOST, PUT, PATCH, DELETE, except the secured credential and user-management APIsMost provisioning and day-2 automation
VIEWERGET only, excluding password and user-management APIsInventory, drift detection, reporting
Which role does this automation need? Default to the least privilege that completes the job VIEWERreads onlyinventory, drift, reports OPERATORwrites, not secretsprovisioning, day-2 ADMINeverythingsecret rotation, users safestmost dangerous if leaked
Only the credential and user-management APIs force ADMIN. Most automation is happy on OPERATOR or VIEWER.

What I tell clients: pick the role from the verbs your automation uses, not from convenience. If a job only reads, it gets VIEWER, and a leaked VIEWER token cannot change a thing. The moment you grant ADMIN to a reporting script because it was easier, you have created a credential that can rotate every password in the platform, sitting in a CI variable.

Step 5: Least privilege and clean teardown

Two habits separate automation that is safe from automation that merely works. First, pull secrets at runtime from a vault or the pipeline’s secret store, never from a file in the repo. A password in plaintext in a playbook is the breach waiting to be reported. Second, invalidate the refresh token when the job is done with DELETE /v1/tokens/refresh-token, so a 24-hour skeleton key does not outlive the five-minute job that created it.

Auth-aware request flow One retry on 401, full re-auth only as a last resort Call API 200? yes Continue 401 Refresh + retry Refresh dead?re-auth from creds
The flow every long-running VCF job should implement, in one picture.
In practice: the first thing I check on an inherited pipeline is whether the token is fetched once and cached for the whole run. If it is, I know exactly where it will fail, and it is always on the longest job, which is always the most important one.

Worked example

A workload domain bring-up runs about 90 minutes. The access token lasts 60. With a single refresh near minute 45 you cover the whole job with margin, and you never re-send the password. A multi-hour upgrade run that spans, say, 5 hours needs roughly five access-token refreshes, all served by the one refresh token, because its 24-hour ceiling is nowhere near. Start the job inside that 24-hour window and you authenticate exactly once, then refresh on a timer until done.

Disclaimer: create service accounts and assign roles in a lab first, use a non-production service account for testing, never commit a token or password to source control, and invalidate refresh tokens you no longer need. Granting ADMIN should be a deliberate decision, not a default.

Common auth failures and the fix

Most VCF authentication problems surface as one of a handful of HTTP status codes. The fastest way to debug is to map the symptom to the cause instead of guessing, because the fix is usually one of two things: the token is too old, or the role is too low. Here is the table I keep in my head.

SymptomLikely causeFix
401 mid-run, token has expiredAccess token aged past its 60-minute TTLRefresh on a timer and catch the 401 to refresh once, then retry
401 on the very first callBad credentials or a missing Bearer prefix in the headerVerify the service account password and the Authorization header format
403 on a POST, PATCH or DELETERole too low for the operation (VIEWER or OPERATOR hitting a secured API)Use a service account with the minimal higher role for that specific job
401 after a long pauseRefresh token expired past its 24-hour TTLRe-authenticate from credentials and start a fresh token pair
Account locked after retriesRepeated bad-credential attempts hammering /v1/tokensAdd backoff, fail fast on a 401 at auth, and alert instead of retrying blindly

The pattern across all of these is simple once you see it. A 401 is almost always about token age. A 403 is almost always about role. Internalise that split and you stop reaching for a password reset every time the API returns an error code, which saves the half hour you would otherwise spend chasing the wrong problem. The other quiet win is logging: log the status code and the request path on failure, never the token itself, so a stack trace in CI does not become a credential leak.

The Bottom Line

Treat authentication as a subsystem, not a line of setup. My recommendation: run every automation under a dedicated service account, default it to OPERATOR and drop to VIEWER for read-only jobs, and reserve ADMIN for the specific automations that rotate secrets or manage users. Refresh the access token preemptively and also handle the 401, so a long job never dies on expiry. Pull secrets from a vault, and invalidate refresh tokens on teardown. Do that once, wrap it in a small session class like the one above, and you never think about the 401 again. Skip it, and you will meet it at minute 61 of your most important pipeline.

Up next we go deeper into the API itself: structure, the API Explorer, and the async task model that every write operation uses. For the bigger picture, see the VCF 9 API-first runbook. How does your pipeline handle token refresh today, timer, 401 handler, or both? Tell me in the comments.

Automating VCF Series navigation:
Previous: Part 2, the automation toolchain.  Next: Part 4, the Unified VCF REST API and async tasks (coming soon).  Up: VCF Automation Guide (pillar).

References

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading