Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

NSX 9 Host Transport Node Prep: VDS, EDP and Verifying It Worked (NSX Series, Part 6)

Preparing an ESXi host as an NSX 9 transport node: what it installs, how the Transport Node Profile applies VDS and EDP across the cluster, and how to verify it.

NSX Series · Part 6 of 30

TL;DR · Key Takeaways

  • Preparing a host as a transport node installs the NSX kernel modules, configures the VDS as the host switch (N-VDS is gone), gives the host its TEP vmkernels, and starts the local control plane agent.
  • You do not prepare hosts one by one. A Transport Node Profile (TNP) applies the host switch, transport zones, uplink profile, and TEP pool to every host in the cluster, identically.
  • EDP Standard is the default host-switch datapath mode in VCF 9. You change it in the TNP, not per host, and almost no one should change it.
  • Get the uplink mapping right: physical vmnics map to named uplinks in the uplink profile, which map to VDS uplinks to the top-of-rack switches. A swapped NIC here is a quiet outage.
  • Never trust a green “Configured” state alone. vmkping the TEPs and check the host switch from the CLI before you call a host ready.
Who this is for: admins and architects preparing ESXi hosts for NSX 9 in VCF 9.  Prerequisites: the design from Part 4 (transport zones, MTU, TEP pool, uplink profile) and a healthy NSX Manager cluster from Part 5.

This is the step where the design stops being a diagram and starts being real. Up to now NSX has been appliances and planning documents. Preparing a host as a transport node is the moment the overlay actually exists on that host: the kernel learns to encapsulate GENEVE, the Distributed Firewall starts filtering at the vNIC, and the host joins the fabric. It is also the step people rush, because the UI makes it look like one click on a cluster. The click is easy. Knowing what it changed, and proving it worked, is the part that keeps you out of a 2 a.m. call later.

What preparing a host actually does

“Prepare” is a tidy word for a meaningful change to the hypervisor. When a host becomes a transport node, NSX installs its kernel modules (VIBs) into ESXi, attaches the host’s VDS as the NSX host switch and sets its datapath mode, creates one or more TEP vmkernel interfaces and gives them addresses from your IP pool, and starts the nsx-proxy agent that is the host’s local control plane. From that point the host can carry overlay traffic and enforce firewall rules in the kernel. None of this disturbs running VMs if the design is right, which is exactly why getting the design right in Part 4 mattered.

What changes on the host BEFORE: plain ESXi VDS (vSphere only) vmnic0 vmnic1 No overlay. No DFW. Just vSphere networking. AFTER: transport node VDS as NSX host switch (EDP Standard) NSX VIBs (kernel) nsx-proxy (LCP) TEP vmk10 / vmk11 DFW at each vNIC Now carries GENEVE overlay and enforces security in-kernel.
Preparation turns a vSphere host into an NSX data-plane node: VIBs, host switch, TEPs, LCP, and the DFW.

The Transport Node Profile does it cluster-wide

You almost never prepare a single host in production. You build a Transport Node Profile once, bundling the host switch (the VDS and its datapath mode), the transport zones, the uplink profile, and the TEP IP pool, and you apply that profile to the cluster. NSX then prepares every host in the cluster to the same spec, and keeps any host you add later in line automatically. This is the difference between a fleet that is identical by construction and one that drifts host by host until a single odd node causes a problem nobody can explain. Use the profile. Preparing hosts individually is for labs and for the one-off bare-metal Edge.

One profile, a whole cluster prepared Transport Node Profile VDS host switch + EDP mode Transport zones Uplink profile TEP IP pool Applied to the cluster host 1 host 2 host 3 host 4 host 5 host 6 + Every host identical. New hosts inherit the profile automatically.
The TNP is the unit of consistency. Build it once, apply it to the cluster, and drift stops being possible.
In practice: for stretched or multi-rack clusters where racks need different TEP VLANs, use a sub-transport node profile for the odd subset rather than abandoning the cluster-wide profile. Keep the single source of truth; template the exception.

EDP Standard, and leaving it alone

The host switch runs a datapath mode, and in VCF 9 that mode defaults to EDP Standard for every new deployment. You set it in the Transport Node Profile, not on individual hosts, which is one more reason the profile is the right tool. EDP Standard runs the improved forwarding stack and allocates CPU to the datapath dynamically, so you get the throughput without hand-reserving cores. For general compute and for Edge clusters, this is the recommended mode and the one I leave in place every time. The only reason to move to the prescriptive EDP Dedicated mode is a measured, named requirement, covered in the performance deep dive (Part 26). For the why behind the speed, see how NSX Enhanced Data Path delivers its throughput boost in VCF 9.

One real planning constraint hides here: EDP needs a supported NIC and driver. An unsupported card does not error loudly; it quietly forwards on a slower path and you lose the performance you designed and paid for. Confirm every host model’s NIC against the VCF 9 EDP compatibility list before you standardize on it, because finding out after you have racked forty hosts is an expensive lesson.

Mapping uplinks: the swap that bites

This is the part of host prep most likely to go subtly wrong. NSX does not talk to physical NICs directly. The uplink profile defines named uplinks and a teaming policy; when you prepare the host, you map each physical vmnic to one of those named uplinks, and the VDS carries them to the top-of-rack switches. Get the mapping consistent across every host, and confirm which physical cable each vmnic actually is. I have seen vmnic2 on half a rack patched to a different switch than the other half, and the symptom is maddening: most things work, failover does not, and traffic takes a path nobody drew.

vmnic to uplink to fabric vmnic0 vmnic1 uplink-1 uplink-2 named in the uplink profile VDS ToR switch A ToR switch B Keep this mapping identical on every host. A swapped cable here is the outage that hides until failover.
NSX maps named uplinks, not raw NICs. Verify which physical port each vmnic really is.

Verify the prepared host

NSX will show the host as “Configured” with a green tick the moment the install finishes. That tick means the configuration applied, not that overlay traffic actually flows. Treat it as the start of verification, not the end. SSH to the host, drop into the NSX CLI, and confirm the host switch, the TEPs, and most importantly that this host’s TEP can reach another host’s TEP at full MTU.

# Confirm the NSX kernel modules are installed
esxcli software vib list | grep -i nsx

# Drop into the NSX CLI on the host
nsxcli

# Host switch present and in the expected datapath mode
get host-switch

# TEP interfaces exist and have addresses from the pool
get vteps

# The real test: TEP-to-TEP at full overlay MTU (DF bit set)
vmkping ++netstack=vxlan -d -s 1572 <remote_host_TEP_IP>

If the vmkping returns clean replies, the host is genuinely part of the overlay. If it fails while a normal-size ping works, you are back to the MTU weakest-link problem from Part 4, and the fix is in the physical path, not in NSX. This one command separates “NSX says it is fine” from “it is actually fine,” and it is the single most useful check in the whole bring-up.

CheckWherePass looks like
Node stateNSX UI, transport nodesConfigured, node status Up, realization success.
Kernel modulesesxcli vib listnsx VIBs present on the host.
TEPsnsxcli get vtepsvmkernels exist, addresses from your pool.
Overlay reachabilityvmkping vxlan netstackReplies at 1572 with DF set, no loss.
UplinksNSX UI / esxcliBoth uplinks up, mapped to the right vmnics.

When host prep goes wrong

Host preparation fails in a small number of recognizable ways, and almost all of them trace back to the design or the physical network rather than NSX itself. Here is the short diagnostic list I keep.

SymptomLikely causeFix
Configured, but TEP vmkping failsMTU below 1600 on a hop, or TEP VLAN not routedFix physical MTU and TEP VLAN routing (Part 4).
TEP has no addressIP pool exhausted or wrong subnetExpand the pool, confirm the subnet matches the VLAN.
Throughput far below expectationNIC not on the EDP compatibility list, fell backUse a supported NIC/driver; re-check the HCL.
Prep stalls or partially appliesA vmnic still owned by a standard switch or in useFree the NIC, confirm VDS uplink assignment, retry.
Disclaimer: preparing transport nodes changes the hypervisor networking stack. Validate against the current VCF 9 BOM and the EDP NIC compatibility list, take a backup, do one host or a non-production cluster first, and keep a maintenance window. Re-verify the exact NSX 9.x version and any host requirements before you start.

Will it disrupt running VMs?

This is the first question every operations team asks, and the honest answer is: it should not, if the design is right, but you still plan it as a careful, rolling change. Because NSX 9 uses the existing VDS as the host switch rather than swapping in a separate N-VDS, preparation is far less invasive than the old N-VDS migrations were. The kernel modules install and the host switch picks up its NSX configuration without tearing down the VMs’ existing port groups. In a healthy cluster with DRS, the safe pattern is still to let the platform prepare hosts one at a time, with capacity to move workloads if a host needs attention, rather than blasting an entire production cluster in one window and hoping.

The exception is the brownfield host that is already running N-VDS from an older NSX-T deployment. You cannot simply re-prepare it onto a VDS in place and expect a clean result; that is a migration, not a preparation, and it has its own workflow and its own risks. If you are staring at hosts that still carry N-VDS, stop here and treat it as the dedicated project it is. That whole path, the Upgrade Coordinator and the N-VDS to VDS migration, gets a full part later in the series (Part 27), because doing it casually mid-prep is how a routine change turns into an outage. Greenfield hosts have none of this baggage: a fresh ESXi host with a clean VDS prepares cleanly.

My take: schedule host prep like any change that touches the data path, even though it is usually quiet. Do one host, verify it fully with the vmkping test, and only then let the rest of the cluster roll. The cost of that discipline is an hour; the cost of skipping it is explaining why a “non-disruptive” change dropped a workload.

What I’d Do

Build one Transport Node Profile per cluster, leave EDP on Standard, and confirm the NICs are on the compatibility list before any of this. Apply the profile, let NSX prepare every host the same way, and then do the part most people skip: vmkping the TEPs and read the host switch from the CLI before you tell anyone the cluster is ready. A prepared host that pings its peers at 1572 with the DF bit set is genuinely on the overlay; a green tick is just a promise. Get into the habit of proving it, and the segments and gateways you build in the next parts will sit on a foundation you actually trust. Next up is Part 7: Edge transport nodes and Edge clusters, where north-south traffic gets its own dedicated nodes. Are your host NICs actually on the EDP list, or did someone assume?


Verifying the host actually joined the data plane

Preparing a host transport node is not finished when the UI says configured; it is finished when you have proven the host is carrying overlay traffic. The configured state tells you NSX pushed the intent, not that the data path works, and the gap between those two is where a half-joined host sits looking fine while quietly failing to pass east-west traffic.

After preparation I confirm three things on the host itself: the TEP came up with the right address and MTU, the tunnels to other transport nodes are established with healthy BFD, and the host switch is running the Enhanced Data Path mode you intended rather than silently falling back. A full-size do-not-fragment ping between TEPs over the overlay stack is the single most useful check, because it exercises exactly the path GENEVE will use and catches the MTU mismatch that the configured state will never reveal. Verify at the data plane, not at the management plane, and a host either genuinely joined or it did not, with no ambiguity left to bite you later.

References

NSX Series · Part 6 of 30
« Previous: Part 5  |  NSX Complete Guide  |  Next: Part 7 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

NSX 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading