TL;DR · Key Takeaways
- POST /v1/tokens returns a pair: an access token (60-minute TTL) and a refresh token (24-hour TTL). The access token is the bearer; the refresh token renews it.
- Any job longer than an hour must refresh. Build authentication as a small subsystem, not a one-time call at the top of a script.
- Refresh preemptively on a timer, and also catch the 401 and refresh once before retrying. Belt and braces.
- Never automate under a personal login. Create a dedicated service account through the Users API and scope it with RBAC.
- Three roles: ADMIN (everything, including credential and user management), OPERATOR (most write operations), VIEWER (read-only GET). Default to the least that works.
The 401 that kills your pipeline at minute 61 is not a bug. It is the design. An access token lives 60 minutes, a refresh token lives 24 hours, and any automation that runs longer than an hour has to plan for both. Authentication on VCF 9 is not a step you do once at the top of a script. It is a small subsystem you build. Getting it right is the difference between a job that finishes and a job that pages you halfway through.
Step 1: Get a token pair
You authenticate by POSTing credentials to /v1/tokens. The response is a pair, not a single token. Use a service account here, not your own login, for reasons we get to in Step 4.
# Step 1: request a token pair
curl -sk -X POST https://sddc-manager.lab.local/v1/tokens
-H 'Content-Type: application/json'
-d '{"username":"svc-automation@vsphere.local","password":"********"}'
# Response:
# {
# "accessToken": "eyJhbGciOiJSUzI1NiJ9.eyJ...", access token, 60-min TTL
# "refreshToken": "9b3c0f2a-7e1d-4a5b-8c6e-..." refresh token, 24-hour TTL
# }
# Failure mode: wrong creds return 401 immediately; repeated failures can lock the account.
What you actually got
The access token is a short-lived JWT you attach to every request. The refresh token is a longer-lived credential whose only job is to mint new access tokens without re-sending the password. Treat the refresh token like a password: it is a 24-hour skeleton key, and it does not belong in a log file or a committed config.
Step 2: Use the access token
Every call carries the access token in the Authorization header as a bearer. That is the whole protocol. The two ways this breaks are both common: forgetting the literal Bearer prefix, and using a token that has already aged out, which returns a 401 with a token-expired message.
# Step 2: call an API with the bearer token
curl -sk https://sddc-manager.lab.local/v1/domains
-H "Authorization: Bearer $ACCESS"
# Step 3 (preview): renew with the refresh token before the access token dies
curl -sk -X PATCH https://sddc-manager.lab.local/v1/tokens/access-token/refresh
-H 'Content-Type: application/json'
-d '{"refreshToken":"9b3c0f2a-7e1d-4a5b-8c6e-..."}'
# -> { "accessToken": "" }
Step 3: Refresh before it dies, not after
The refresh token buys you 24 hours of renewals without ever re-sending the password. The pattern I ship has two layers. A timer refreshes the access token well inside the 60-minute window, so steady-state work never hits an expiry. A 401 handler catches the case where the timer was wrong, the clock skewed, or the job paused, refreshes once, and retries. If the refresh token itself has expired, fall back to a full re-auth.
The refresh-on-401 pattern
Here is the small session wrapper I actually use in Python. It authenticates once, retries a single time on a 401 after refreshing, and re-authenticates only if the refresh token is also dead. Forty lines that turn the most common pipeline failure into a non-event.
import requests
BASE = "https://sddc-manager.lab.local"
requests.packages.urllib3.disable_warnings()
class VcfSession:
def __init__(self, user, password):
self.user, self.password = user, password
self._auth()
def _auth(self):
r = requests.post(f"{BASE}/v1/tokens", verify=False,
json={"username": self.user, "password": self.password})
r.raise_for_status()
t = r.json()
self.access, self.refresh = t["accessToken"], t["refreshToken"]
def _renew(self):
try:
r = requests.patch(f"{BASE}/v1/tokens/access-token/refresh",
verify=False, json={"refreshToken": self.refresh})
r.raise_for_status()
self.access = r.json()["accessToken"]
except requests.HTTPError:
self._auth() # refresh token also expired: start over
def get(self, path):
h = {"Authorization": f"Bearer {self.access}"}
r = requests.get(f"{BASE}{path}", headers=h, verify=False)
if r.status_code == 401: # token died mid-run
self._renew()
h = {"Authorization": f"Bearer {self.access}"}
r = requests.get(f"{BASE}{path}", headers=h, verify=False)
r.raise_for_status()
return r.json()
# usage
s = VcfSession("svc-automation@vsphere.local", "********")
print([d["name"] for d in s.get("/v1/domains")["elements"]])
Step 4: Stop using your own login
Automation that runs as a named human is a liability. When that person leaves, rotates their password, or has their account locked, every pipeline they touched dies with no warning. Create a dedicated service account through the Users API (POST /v1/users) and give it exactly the role it needs. SDDC Manager has three.
| Role | What it can do | Use it for |
|---|---|---|
| ADMIN | All methods, including credential and password management and user management | Automation that rotates secrets or manages users |
| OPERATOR | POST, PUT, PATCH, DELETE, except the secured credential and user-management APIs | Most provisioning and day-2 automation |
| VIEWER | GET only, excluding password and user-management APIs | Inventory, drift detection, reporting |
What I tell clients: pick the role from the verbs your automation uses, not from convenience. If a job only reads, it gets VIEWER, and a leaked VIEWER token cannot change a thing. The moment you grant ADMIN to a reporting script because it was easier, you have created a credential that can rotate every password in the platform, sitting in a CI variable.
Step 5: Least privilege and clean teardown
Two habits separate automation that is safe from automation that merely works. First, pull secrets at runtime from a vault or the pipeline’s secret store, never from a file in the repo. A password in plaintext in a playbook is the breach waiting to be reported. Second, invalidate the refresh token when the job is done with DELETE /v1/tokens/refresh-token, so a 24-hour skeleton key does not outlive the five-minute job that created it.
Worked example
A workload domain bring-up runs about 90 minutes. The access token lasts 60. With a single refresh near minute 45 you cover the whole job with margin, and you never re-send the password. A multi-hour upgrade run that spans, say, 5 hours needs roughly five access-token refreshes, all served by the one refresh token, because its 24-hour ceiling is nowhere near. Start the job inside that 24-hour window and you authenticate exactly once, then refresh on a timer until done.
Common auth failures and the fix
Most VCF authentication problems surface as one of a handful of HTTP status codes. The fastest way to debug is to map the symptom to the cause instead of guessing, because the fix is usually one of two things: the token is too old, or the role is too low. Here is the table I keep in my head.
| Symptom | Likely cause | Fix |
|---|---|---|
| 401 mid-run, token has expired | Access token aged past its 60-minute TTL | Refresh on a timer and catch the 401 to refresh once, then retry |
| 401 on the very first call | Bad credentials or a missing Bearer prefix in the header | Verify the service account password and the Authorization header format |
| 403 on a POST, PATCH or DELETE | Role too low for the operation (VIEWER or OPERATOR hitting a secured API) | Use a service account with the minimal higher role for that specific job |
| 401 after a long pause | Refresh token expired past its 24-hour TTL | Re-authenticate from credentials and start a fresh token pair |
| Account locked after retries | Repeated bad-credential attempts hammering /v1/tokens | Add backoff, fail fast on a 401 at auth, and alert instead of retrying blindly |
The pattern across all of these is simple once you see it. A 401 is almost always about token age. A 403 is almost always about role. Internalise that split and you stop reaching for a password reset every time the API returns an error code, which saves the half hour you would otherwise spend chasing the wrong problem. The other quiet win is logging: log the status code and the request path on failure, never the token itself, so a stack trace in CI does not become a credential leak.
The Bottom Line
Treat authentication as a subsystem, not a line of setup. My recommendation: run every automation under a dedicated service account, default it to OPERATOR and drop to VIEWER for read-only jobs, and reserve ADMIN for the specific automations that rotate secrets or manage users. Refresh the access token preemptively and also handle the 401, so a long job never dies on expiry. Pull secrets from a vault, and invalidate refresh tokens on teardown. Do that once, wrap it in a small session class like the one above, and you never think about the 401 again. Skip it, and you will meet it at minute 61 of your most important pipeline.
Up next we go deeper into the API itself: structure, the API Explorer, and the async task model that every write operation uses. For the bigger picture, see the VCF 9 API-first runbook. How does your pipeline handle token refresh today, timer, 401 handler, or both? Tell me in the comments.
Previous: Part 2, the automation toolchain. Next: Part 4, the Unified VCF REST API and async tasks (coming soon). Up: VCF Automation Guide (pillar).
References
- VMware Cloud Foundation API: Tokens (Broadcom Developer Portal)
- VMware Cloud Foundation API Reference Guide
- About the VIEWER Role and SDDC Manager Local Account (VCF Blog)



