TL;DR · Key Takeaways
- In VCF 9 you get a single overlay transport zone per NSX instance, shared by every cluster across every workload domain. Stop designing for multiple overlay TZs; you cannot have them here.
- GENEVE needs MTU 1600 minimum end to end. Set TEP and uplink MTU to 1700, or jumbo 9000, and make sure nothing in the path is smaller. One narrow link breaks the overlay.
- Host TEPs and Edge TEPs belong on different VLANs when an Edge VM runs on a prepared host, and those VLANs must be routable to each other. This is the classic day-one outage.
- EDP Standard is the default datapath in VCF 9 and the right answer for almost everyone. EDP Dedicated reserves physical CPUs and is for prescriptive, known-traffic builds only.
- Get transport zones, MTU, TEP pools, and uplink profiles right on paper before you prepare a single host. Fixing them later means re-preparing transport nodes.
The overlay either comes up clean or it fights you for a week, and which one you get is decided on paper, before anyone touches a host. I have walked into enough “NSX is broken” engagements to know the pattern: the build was fine, the design was not. A transport zone scoped wrong, an MTU that is 1500 somewhere it should be 1700, a TEP VLAN that cannot reach the Edge. None of those show up until traffic does. This part is the design pass I run with every client before we prepare a single transport node, in the order I actually do it.
The objects you build, and the order
NSX design is a dependency chain. Each object assumes the one before it is right, so getting the order straight is half the battle. You start in the physical underlay (VLANs and MTU your network team owns), land on the VDS, wrap that in an uplink profile, scope it with transport zones, hand out TEP addresses from an IP pool, package the lot into a transport node profile, and only then prepare the cluster. Verify last. Skip a step and you find out three objects later, in a place that does not obviously point back to the cause.
Transport zones: the single-overlay reality in VCF 9
A transport zone defines what a transport node can reach. It identifies the traffic type, overlay or VLAN, and binds to a named VDS. Here is the VCF 9 constraint that resets old NSX-T habits: VCF supports a single overlay transport zone per NSX instance, and every vSphere cluster, within and across workload domains, shares it. If you came up building one overlay TZ per zone or per tenant, that pattern is gone. Tenancy and isolation now live in Projects and VPCs (Part 22), not in a wall of overlay transport zones.
VLAN transport zones are different: you can have several, and they carry the VLAN-backed segments that connect NSX to the physical world, including the uplinks the Edge uses to reach your Tier-0. So the mental model is one shared overlay fabric for east-west tenant traffic, plus the VLAN transport zones that stitch it to the outside.
MTU: the number that quietly breaks overlays
GENEVE wraps every overlay frame in an outer header, so the packet on the wire is bigger than the guest ever sees. The guest still thinks it is sending 1500 bytes; the TEP adds roughly 50 to 90 bytes of encapsulation on top. That is why NSX requires the transport path to carry at least 1600 bytes, and why I set TEP and uplink MTU to 1700 as a habit, with the physical fabric on jumbo (9000) where the network team allows it. The rule that actually matters: MTU is a weakest-link property. The path is only as wide as its narrowest hop, so one access port left at 1500 silently drops large overlay frames while small pings sail through.
Do not trust the design document; test the path. From a prepared host, ping another host’s TEP with the don’t-fragment bit set and a payload sized to exercise the full MTU. If 1572 passes with DF set, you have your 1600. If it fails while a normal ping works, you have found your narrow hop.
# From an ESXi host, test TEP-to-TEP MTU on the overlay netstack
# -d sets do-not-fragment, -s 1572 + headers approaches 1600
vmkping ++netstack=vxlan -d -s 1572 <remote_host_TEP_IP>
# Success: replies return. Failure: "sendto() failed (Message too long)"
# or 100% loss while a normal-size vmkping still works = a narrow hop.
TEP IP pools and the VLAN routing trap
Every transport node needs a tunnel endpoint address. For hosts, you hand these out from an NSX IP pool (or DHCP, but a pool is cleaner and what I use). Size the pool for the cluster you will grow into, not the one you have today, and remember that NSX uses multiple TEPs per host for load spreading, so the address count is hosts times TEPs-per-host plus headroom.
Worked example: sizing the host TEP pool
Target cluster: 24 hosts, 2 TEPs per host for multi-TEP load spreading.
Addresses needed = 24 × 2 = 48. Add growth headroom and round up to a /26 (62 usable).
A /27 (30 usable) would not even cover today. Pools are painful to resize after hosts are prepared, so size for the cluster’s full footprint on day one.
Now the trap that takes out more first-day deployments than anything else on this list. When you run NSX Edge nodes as VMs on a host that is itself a prepared transport node, the host TEP and the Edge TEP must sit on different VLANs and different subnets, and those two TEP VLANs have to be routable to each other. If you put both on the same VLAN, inter-TEP traffic between the host and its local Edge breaks in ways that look like everything and nothing. Plan two TEP VLANs, confirm the fabric routes between them, and you avoid the single most common NSX bring-up outage.
EDP mode: Standard for almost everyone
Enhanced Data Path is the high-performance forwarding stack, and in VCF 9 it is the default for new workload domains and new NSX installs. There are two modes, and the choice is easier than people make it. EDP Standard runs the improved packet-forwarding path out of the box with no manual CPU reservation, and Broadcom recommends it for general compute and for Edge clusters. EDP Dedicated (the prescriptive performance mode) pins physical CPU cores to the datapath and is only worth it when you know your traffic profile cold and have a specific, measured reason. Pick Standard unless someone can name that reason.
| Mode | What it does | CPU | Use it for |
|---|---|---|---|
| EDP Standard | Improved forwarding stack, high packet-per-second out of the box. Default in VCF 9. | No manual reservation | General compute and Edge clusters. The default answer. |
| EDP Dedicated (Performance) | Prescriptive mode tuned to a known traffic pattern. | Reserves physical cores | NFV / telco / measured high-throughput cases only. |
One planning caveat: EDP wants a supported NIC and driver. Before you commit a host model, check the NIC against the VCF 9 EDP compatibility list, because an unsupported card quietly falls back and you lose the performance you designed for. There is more on this in the datapath deep dive (Part 26) and the broader picture in how NSX Enhanced Data Path delivers its throughput boost in VCF 9.
Uplink and transport node profiles
Uplink profile
The uplink profile is the policy that says how a transport node connects upward: the teaming policy across its uplinks, the transport (TEP) VLAN, and the MTU. Get the teaming policy deliberate here. Named teaming policies let you pin specific traffic, for example Edge uplinks, to specific physical NICs, which matters when you care about which cable carries north-south. Set the MTU on the uplink profile to 1700 to match the TEP and fabric.
Transport node profile
The transport node profile is the template you apply to a whole cluster so every host comes out identical: the VDS, the transport zones, the uplink profile, and the TEP pool, bundled. For stretched or multi-rack clusters where hosts in different racks need different TEP VLANs or subnets, a sub-transport node profile templates that subset without breaking the single cluster-wide profile. Use it; hand-configuring per-host networking is how drift and one-off outages start.
The design decisions on one page
This is the checklist I leave with clients. Decide every row before host preparation, because every one of them is painful to change after transport nodes are live.
| Decision | Recommendation | What bites you if wrong |
|---|---|---|
| Overlay transport zone | One per NSX instance, shared by all clusters (VCF rule). | Designing for many overlay TZs; not supported in VCF. |
| GENEVE MTU | 1700 on TEP and uplinks; jumbo 9000 on the fabric. | One 1500 hop drops large frames; pings still pass. |
| Host TEP pool | Size for the full cluster, hosts × TEPs + headroom. | Pool exhaustion mid-expansion; painful to resize. |
| Host vs Edge TEP VLANs | Separate VLANs and subnets, routable, when Edge runs on host. | Same VLAN breaks inter-TEP; classic day-one outage. |
| EDP mode | Standard, unless you have a measured reason for Dedicated. | Unsupported NIC falls back; lost performance. |
| Profiles | Transport node profile per cluster; sub-profile for multi-rack. | Per-host hand config leads to drift and one-offs. |
What I’d Do
Treat this as a one-page design that gets signed off before procurement closes, not a thing you figure out during bring-up. Lock the single overlay transport zone, set 1700 everywhere and jumbo on the fabric, size the TEP pool for the cluster you are growing into, split host and Edge TEP VLANs and prove the fabric routes between them, default to EDP Standard, and template everything with transport node profiles. Do that and the build is boring, which in networking is the highest compliment. Skip it and you will spend your first week chasing an overlay that pings fine and moves no data. Next up is Part 5: NSX Manager deployment and cluster bring-up. Which of these six decisions is least nailed down in your current design?
The design decisions you cannot easily change later
Some NSX choices are cheap to revisit and some are effectively permanent once workloads are running, and the design Part is where you separate them. The transport zone layout, the TEP pool sizing and the underlay MTU are in the permanent column. A transport zone defines the span of your segments; a TEP pool that is sized too small quietly caps how many transport nodes you can ever add; and an underlay MTU left at the default 1500 instead of 1600 breaks the overlay the moment real traffic flows.
These are the decisions worth slowing down for, because changing them after deployment means touching every host in a maintenance window rather than editing a field. I would rather spend an extra afternoon in the planning workbook confirming the TEP address space has headroom for years of growth and that 1600 is set end to end, including every routed hop between TEP subnets, than discover the constraint the day a new cluster will not join or large flows start hanging. Plan the things that are hard to undo as if they are hard to undo, because they are.
References
- Create an IP Pool for Tunnel Endpoint IP Addresses (Broadcom TechDocs, VCF 9)
- Enhanced Data Path (Broadcom TechDocs, VCF 9)
- NSX in VCF 9: Guidance to Set Maximum Transmission Unit (Broadcom TechDocs)



