Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

Migrating to NSX 9: N-VDS to VDS, the Migration Coordinator and the Removal You Cannot Dodge (NSX Series, Part 27)

NSX 9 removed N-VDS on ESX, so every host still on the N-VDS host switch has to move to VDS before it can run NSX 9. Here is the per-host migration loop, the migrate_to_vds API, the new-VDS-only constraint, and how to do it without dropping production.

NSX Series · Part 27 of 30

TL;DR · Key Takeaways

  • NSX 9 removed support for N-VDS on ESX. The VDS is the only host switch now, so every transport node still running N-VDS must migrate to VDS before it can run NSX 9. This is not optional.
  • The migration is a per-host loop driven by the Migration Coordinator and the Policy-era transport-node API: pre-check, fix inconsistencies, maintenance mode, migrate, poll for SUCCESS, exit maintenance mode, repeat.
  • One hard constraint catches people: you migrate an N-VDS onto a brand-new VDS. You cannot fold it onto an existing VDS. Plan the target VDS up front.
  • Run the pre-check until it is clean before you migrate anything. Most failures are configuration inconsistencies the pre-check already flagged and someone migrated past anyway.
  • Sequence it with the upgrade. Do the N-VDS to VDS migration first, on your current version, then upgrade to NSX 9. Trying to do both in one window is how a planned change becomes an outage.
Who this is for: NSX and vSphere admins on NSX-T 3.x with N-VDS host switches, planning the move to NSX 9 or VCF 9.  Prerequisites: transport-node basics from Part 6, a maintenance window, DRS, and a current NSX backup.

If any of your ESX transport nodes still run an N-VDS host switch, NSX 9 will not have them. Support for N-VDS on ESX is gone, starting with the NSX version that underpins VCF 9, and the VDS is the only host switch left. There is no compatibility mode, no grace period, no flag to flip. So the migration from N-VDS to VDS stops being a tidy-up task you keep meaning to get to and becomes a hard prerequisite on the path to NSX 9. The good news: the tooling to do it cleanly has existed since NSX 3.2, and the process is mechanical once you respect its rules.

Why N-VDS is gone, and what replaces it

For years NSX-T shipped its own host switch, the N-VDS, separate from the vSphere Distributed Switch. It worked, but it meant two switch constructs to understand, two places to configure, and a fair amount of duplicated networking on every host. The industry direction was always convergence onto the VDS, and NSX 9 finishes that journey by removing N-VDS on ESX entirely. From NSX 9 forward there is one host switch, the VDS, carrying both your regular vSphere portgroups and your NSX overlay and VLAN transport. Fewer moving parts, one mental model. The catch is that anything still on the old model has to move before it can come along.

One host switch from here on NSX 9 removes N-VDS on ESX. The VDS carries everything. Before (NSX-T 3.x) ESX host N-VDSNSX transport VDSvSphere portgroups After (NSX 9) ESX host VDS onlyNSX transport + portgroups
Diagram 1: The change map. Two switch constructs collapse into one VDS that carries both vSphere and NSX traffic.

The constraint that catches people

You migrate an N-VDS onto a brand-new VDS. You cannot reuse an existing VDS as the migration target. Teams assume they will just merge the N-VDS into the VDS that already carries their management portgroups, hit the wall mid-plan, and have to redesign the target switch and its uplinks on the spot. Decide the new target VDS, its uplinks, and its portgroup layout before you touch the first host.

The per-host migration loop

The Migration Coordinator and the transport-node API drive this, and it runs one host at a time so the rest of the cluster keeps serving workloads. The loop is the same for every host, and the discipline is in not skipping the pre-check.

Step 1: Pre-check until it is clean

Prepare the hosts and run the readiness pre-check. It will flag configuration inconsistencies between what the N-VDS carries and what the target VDS can accept. Fix them, then run the pre-check again. Repeat until it comes back clean. This is the step that decides whether the rest of the migration is boring or painful. A pre-check warning you migrate past does not disappear; it turns into a failed host halfway through the window.

Step 2: Maintenance mode and migrate

Put the host into maintenance mode from vCenter so DRS evacuates its VMs, then kick off the migration for that transport node. With Policy-era NSX the action is a single API call against the transport node, and you poll the same endpoint until the status reports SUCCESS.

# Trigger the N-VDS to VDS migration for one transport node
POST https://NSX-MGR/api/v1/transport-nodes/<tn-id>?action=migrate_to_vds

# Poll the same transport node until status shows SUCCESS
GET  https://NSX-MGR/api/v1/transport-nodes/<tn-id>/state

# Then exit maintenance mode in vCenter and move to the next host

Step 3: Verify, exit, repeat

Once the transport node reports SUCCESS, confirm the host is healthy on the new VDS, take it out of maintenance mode, let DRS rebalance, and move to the next host. Walk the whole cluster this way. Because it is serial and each host is drained before it migrates, the blast radius of any single failure is one empty host, which is exactly the safety property you want.

The per-host loop One host at a time. The cluster keeps serving while each drains and migrates. Pre-checkuntil clean Maint. modeDRS evacuates migrate_to_vdspoll to SUCCESS Exit + verifyhealthy on VDS Next hostrepeat
Diagram 2: The loop repeats per host. Serial and drained, so a failure during any single host is contained.
StepActionThe gotcha
Pre-checkVerify host readiness, fix, re-runMigrating past a warning fails mid-window
Target VDSProvision a new VDSCannot reuse an existing VDS
Maintenance modeEvacuate the host firstNeeds DRS automated and spare capacity
Migrate + pollmigrate_to_vds, wait for SUCCESSDo not proceed before SUCCESS
RepeatOne host at a timeDo not parallelize a whole cluster
My take: separate this migration from the NSX 9 upgrade itself. Get every host off N-VDS and onto VDS first, on your current supported version, verify the environment is healthy, and only then run the upgrade to NSX 9. Bundling the two into one change window doubles the number of things that can fail and halves your ability to tell which one did. Two boring changes beat one heroic one.
Disclaimer: N-VDS to VDS migration is a production-change to host networking. Confirm a restorable NSX backup, validate the source version supports the Migration Coordinator, design and provision the target VDS in advance, run pre-checks until clean, ensure DRS is fully automated with capacity to evacuate, and rehearse on a non-production cluster first. Migrate one host at a time and verify health before proceeding.

What’s Next

N-VDS to VDS is the one migration on the road to NSX 9 you cannot route around, so treat it as a project in its own right rather than a footnote in the upgrade. Provision a new target VDS up front, because you cannot reuse an existing one. Run the pre-check until it is genuinely clean, then walk the cluster one host at a time, maintenance mode, migrate_to_vds, poll for SUCCESS, verify, repeat. Keep this migration separate from the NSX 9 upgrade so each change stands on its own. Do that and the removal of N-VDS becomes a non-event, a planned step you completed weeks before the upgrade rather than a surprise that blocks it. The upgrade mechanics that follow are exactly the orchestrated flow from Part 20 on upgrades and lifecycle. Are your hosts already on VDS, or is N-VDS quietly blocking your path to NSX 9?

Validating each host and planning rollback

The migrate_to_vds call returning SUCCESS tells you the transport node flipped onto the VDS. It does not tell you the workloads on that host are healthy on the new switch, and that gap is exactly where a clean-looking migration quietly drops traffic. I treat per-host verification as part of the loop, not an afterthought.

What success actually means per host

After a host reports SUCCESS, I check four things before exiting maintenance mode: the host TEP still reaches other TEPs with a full-size do-not-fragment ping (the overlay MTU discipline from Part 29 applies directly here), the tunnels and BFD sessions are up, the VMs land on the correct VDS portgroups with real connectivity, and no fresh alarms fired on the transport node. Only when all four are green does the host count as done. Trusting the API status by itself is how a team discovers hours later that several hosts have been silently blackholing east-west traffic.

Your rollback options narrow as you go

Rolling back a single freshly migrated host is cheap: the cluster is still prepared and the original uplink design is intact, so you re-evacuate and revert that one host. Rolling back a whole cluster after eight hosts have moved is a different animal, because each revert is its own maintenance operation and the cluster has been running mixed-switch for hours. That asymmetry is the real reason per-host verification matters. Catching a problem on host three is an inconvenience; catching it on host twenty is an incident. Keep the change window scoped to one cluster, and stop the run the moment a host fails verification rather than pressing on into a deeper hole.

Four gates before you exit maintenance mode All four green, or the host is not done. SUCCESS from the API is only gate zero. 1 TEP reachfull-size DF pingover the overlay 2 TunnelsBFD up, nozero counters 3 VM connectivityright VDSportgroups 4 No alarmsclean transportnode state
Diagram 3: The per-host gate set. Walk all four before the host leaves maintenance mode and before you touch the next host.

Worked example

A 24-host cluster, budgeting roughly 12 to 18 minutes per host including the DRS drain, the migration, verification and the rebalance, is a 5 to 7 hour serial run. That rarely fits a single maintenance window, so I split it across two or three nights, a third of the cluster at a time, verifying each batch and leaving the cluster healthy between sessions. Plan the calendar around the per-host time, not the optimistic SUCCESS-to-SUCCESS number, because the verification and rebalance are where the minutes actually go.

What changes operationally after the move

One switch, one place to look

Once a cluster is fully on VDS, day-two networking gets simpler in a way that is easy to undervalue. There is one host switch to reason about instead of two, the uplink and teaming policy lives in one place, and the vSphere and NSX views finally agree about what the host networking looks like. Troubleshooting stops involving the question of which construct owns a given vmnic. For teams that lived through the N-VDS era, that single-switch clarity removes a whole category of confusing support calls.

Re-run the pre-check before every cluster

If you are migrating several clusters, resist the urge to assume the second cluster is like the first. Each cluster has its own portgroup layout, its own uplink count, and its own little configuration drift accumulated over years, so each one earns its own pre-check until clean. The migration mechanics are identical, but the inconsistencies the pre-check surfaces are per-cluster, and the cluster you assume is fine is the one that bites.

Edges and the management domain are a separate conversation

One scoping point saves a lot of needless worry. This migration is specifically about the ESX host switch on workload-domain transport nodes. NSX Edge nodes, whether VM or bare-metal, were never N-VDS-on-ESX in the first place, so they are not migrated by this workflow; they carry their own networking and are lifecycled through the Edge upgrade path covered in Part 20. The VCF management domain likewise has its own prescribed networking that SDDC Manager owns and maintains.

So before you build the runbook, draw a clean line around what is actually in scope: the ESX transport nodes still on an N-VDS host switch. Do not let the N-VDS removal headline scare you into thinking the Edges or the management plane need re-platforming alongside them. Knowing precisely which nodes are in scope is half of a calm change window, and the per-host verification gates above are the other half. Get both right and the most disruptive-sounding prerequisite on the road to NSX 9 becomes a quiet, repeatable per-cluster routine.

In practice: stage the migration cluster by cluster, not host by host across the whole estate at once. A per-cluster cadence keeps each change window bounded, keeps the rollback story simple, and means a problem in one cluster never blocks the others. The estate gets to VDS a cluster at a time, and every step stays boring.

The broader point is that a forced removal like N-VDS is not a reason to rush. The tooling is mature, the per-host loop is safe when you respect the gates, and the scope is narrower than the headline suggests. Teams that plan the target VDS, run the pre-check to clean, verify every host, and stage the work cluster by cluster turn a scary-sounding prerequisite into a routine maintenance exercise. Teams that bundle it with the upgrade and skip verification are the ones writing the incident report. The difference is entirely in the preparation, not the technology.

References

NSX Series · Part 27 of 30
« Previous: Part 26  |  NSX Complete Guide  |  Next: Part 28 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

NSX 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading