Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

NSX 9 Tier-1 Gateways and East-West Routing: DR, SR and the Hairpin Trap (NSX Series, Part 10)

The Tier-1 is half distributed router, half service router, and knowing the difference decides whether east-west traffic stays local or hairpins to the Edge.

NSX Series · Part 10 of 30

TL;DR · Key Takeaways

  • A Tier-1 gateway is the default gateway for your segments, and it is two things at once: a Distributed Router (DR) in every host kernel and, when needed, a Service Router (SR) on the Edge.
  • The DR handles east-west routing distributed across hosts, in-kernel, with no hairpin and no Edge involvement. This is where NSX routing is fast and free.
  • An SR is created only when you attach the Tier-1 to an Edge cluster or configure stateful services (NAT, gateway firewall, DHCP, load balancing).
  • The trap: attaching an Edge cluster to a Tier-1 you do not need stateful services on forces an SR and can hairpin east-west traffic to the Edge that would otherwise stay local.
  • A Tier-1 advertises its routes (connected segments, NAT, LB VIPs) up to the Tier-0. Pick what you advertise on purpose.
Who this is for: admins and architects designing east-west routing and per-tenant gateways in NSX 9.  Prerequisites: segments (Part 8) and a working Tier-0 (Part 9).

If the Tier-0 is the door to the outside world, the Tier-1 is the workhorse that does the everyday routing inside it. Almost every segment plugs into a Tier-1, and almost every east-west flow between segments crosses one. And yet the Tier-1 is the object I most often see designed by accident, because people treat it as a single router when it is really two cooperating halves with very different behaviour. Understanding that split, the distributed router and the service router, is the single most useful routing concept in NSX, and getting it wrong is how you end up hairpinning traffic across the data center that should never have left the host.

One gateway, two halves

A Tier-1 gateway is realized as two components. The Distributed Router (DR) is a kernel module that exists on every transport node, so the gateway is literally present on every host. When two VMs on different segments of the same Tier-1 talk to each other, the routing happens in the kernel of the host they are on, with no trip to a central appliance and no added latency. The Service Router (SR) is different: it is a centralized component that runs on an Edge node, and it only comes into existence when the gateway needs to do something the distributed kernel cannot, namely host a stateful service or connect to the wider network through an Edge. One Tier-1, two halves, each with its own job.

The Tier-1 is a DR everywhere, an SR only sometimes DISTRIBUTED ROUTER (every host) host ADR in kernel host BDR in kernel host CDR in kernel Same gateway on every host. East-west routes locally. No Edge needed. No hairpin. Scales with hosts. SERVICE ROUTER (Edge only) SR on an Edge node Created only for stateful services or an Edge-cluster attachment. NAT, FW, DHCP, LB.
The DR is on every host for east-west. The SR appears on the Edge only when the gateway needs centralized services.
ComponentRuns onHandlesWhen it exists
Distributed Router (DR)Every host kernelEast-west routing between segments.Always, the moment a Tier-1 exists.
Service Router (SR)NSX Edge nodeStateful services, connection via Edge.Only with services or an Edge cluster attached.

East-west routing is distributed and free

This is the part that still impresses me after years of it. When a web VM on one segment talks to an app VM on another segment of the same Tier-1, and both VMs happen to be on the same host, the packet is routed entirely within that host’s kernel and never touches the physical network at all. When the two VMs are on different hosts, the packet rides the overlay directly between them, routed by the DR on each end, again with no detour through any central router. There is no appliance in the path, no hairpin, and the capacity grows automatically as you add hosts. This distributed routing is the whole reason east-west scales the way it does, and it is the default behaviour of any Tier-1 that does not have an SR.

East-west: routed locally, never via the Edge ESXi host (DR in kernel) web-vmsegment A app-vmsegment B routed in kernel Edge clusternot in this path(no SR involved) no hairpin here
A DR-only Tier-1 keeps east-west traffic local. The Edge does not see it at all.

The Edge-cluster decision, and the hairpin

Here is the decision that quietly shapes your traffic patterns. When you create a Tier-1, you can optionally attach it to an Edge cluster. You must do this if the Tier-1 needs stateful services, because those run on the SR, which lives on the Edge. But, and this is the part people miss, attaching an Edge cluster always creates an SR, even if you configure no services at all. And once an SR exists, certain traffic that the DR would have handled locally now has to route through the SR on the Edge, which means it leaves the host, crosses to an Edge node, and comes back. That is a hairpin, and on a Tier-1 that did not need services it is pure waste: latency and Edge load you bought for nothing.

Attach an Edge cluster you do not need, get a hairpin ESXi host vm 1 vm 2 SR on Edgecreated by the Edge-cluster attach out to the Edge and back, for traffic that could have stayed local
An unnecessary Edge-cluster attachment turns free local routing into an Edge round trip.
In practice: the rule I give clients is simple. Attach an Edge cluster to a Tier-1 only when that Tier-1 genuinely needs a stateful service. If all it does is route east-west between segments, leave it DR-only and keep the traffic in the kernel where it belongs. I have reclaimed real Edge capacity just by removing Edge-cluster attachments that nobody needed.

Route advertisement up to the Tier-0

A Tier-1 connected to a Tier-0 does not automatically tell the Tier-0 about its networks. You choose what it advertises, per route type, and only those get auto-plumbed up the transit to the Tier-0 and from there, via redistribution and BGP (Part 9), out to the fabric. The common categories are the connected segment subnets, any NAT addresses, and load-balancer virtual IPs. This gives you a clean two-stage control over reachability: the Tier-1 decides what it offers upward, and the Tier-0 decides what actually leaves for the physical world. It also means a forgotten advertisement is a common cause of “the segment exists, routing looks fine, but nothing outside can reach it.”

Attach an Edge cluster to a Tier-1?Decision
Needs stateful NAT, gateway firewall, DHCP, or LBYes. The SR hosts those services.
Pure east-west routing between segmentsNo. Keep it DR-only; avoid the hairpin.
Needs to advertise routes north to the Tier-0Route advertisement works without an SR; you still do not need the Edge cluster just for that.
Per-tenant isolationUse a Tier-1 per tenant; attach an Edge cluster only to the ones with services.

Per-tenant Tier-1 patterns

The Tier-1 is the natural boundary for a tenant or an application environment. A common, clean pattern is one Tier-1 per tenant, each owning that tenant’s segments, all connected up to a shared Tier-0 for north-south. Tenants are isolated from each other because their segments hang off different Tier-1s, and the Tier-0 provides the common exit. Where a tenant needs its own NAT or load balancing, that specific Tier-1 gets an Edge cluster and an SR; tenants that only route east-west stay DR-only and cost the Edge nothing. This scales cleanly and keeps the expensive, centralized resources allocated only where they earn their place. For harder multi-tenancy with full self-service, Projects and VPCs take this further, and that is Part 22.

Seeing the DR and SR for yourself

The DR/SR split is not just a concept on a slide; you can see both halves from the CLI, and doing so is the fastest way to confirm whether a Tier-1 has quietly grown a service router you did not intend. On an Edge node you can list the logical routers, and a Tier-1 SR will appear there only if one was created. On a transport node you can confirm the DR is present and carrying the segment routes. When east-west latency looks wrong, this is how I prove whether traffic is staying distributed or being pulled to the Edge.

# On an Edge node: does this Tier-1 have an SR here?
get logical-router            # lists routers; a Tier-1 SR shows up only if created
get logical-router <uuid> route   # the routes this SR holds

# On a host transport node: confirm the DR and its routes
get logical-router            # the DR instances on this host
get logical-router <uuid> forwarding   # the forwarding table the datapath uses

# If you find an SR on a Tier-1 that only routes east-west,
# that is the hairpin. Detach the Edge cluster if no service needs it.

The other thing worth confirming after any Tier-1 change is realization state in the NSX UI. NSX is declarative: you express intent and the control plane realizes it onto the data plane, and occasionally a change sits in a partially realized state because something downstream is unhappy. A Tier-1 that shows anything other than a clean success is a Tier-1 whose routing you should not trust yet. Check realization, then check the data path, then move on. Two minutes here saves an afternoon of chasing a routing problem that was really a realization problem.

My take: on a brownfield estate, the first audit I run is a sweep for Tier-1s with Edge clusters attached and no stateful services configured. It is one of the most common findings, and detaching the unused Edge clusters quietly gives back latency and Edge headroom that everyone assumed they had to buy more hardware to get.

What I’d Do

Treat the DR/SR distinction as the first thing you reason about on any Tier-1. Default every Tier-1 to DR-only and keep east-west traffic distributed in the kernel, which is where NSX routing is fast and cheap. Attach an Edge cluster, and accept the SR, only when a Tier-1 genuinely needs a stateful service, and never just out of habit, because that habit hairpins traffic and burns Edge capacity you will wish you had later. Advertise routes to the Tier-0 deliberately, one type at a time, and remember that a missing advertisement is a frequent cause of a segment that is unreachable for no obvious reason. Use a Tier-1 per tenant as your isolation boundary. Next up is Part 11: NAT, DHCP, and the DNS forwarder, the first of the services that actually require that SR. Which of your Tier-1s have an Edge cluster attached that they do not actually need?


DR, SR and the hairpin you did not mean to create

A Tier-1 gateway has two faces. The distributed router component runs on every host and handles east-west routing right there in the hypervisor, with no trip to the Edge, which is exactly the distributed efficiency NSX is built around. The service router component only comes into existence when you enable a stateful service on the Tier-1, and it lives on the Edge. The instant you turn on something stateful, you instantiate a service router and you change the traffic path, because flows that need that service now have to hairpin through the Edge instead of staying distributed across the hosts.

That hairpin is not wrong, but it should be a deliberate choice rather than a surprise. It shows up as added latency and as load on the Edge cluster, and a design that casually enables stateful services on many Tier-1s can push far more traffic through the Edge than anyone intended. Keep east-west distributed wherever you can, push to the service router only when a specific service genuinely requires it, and when you do, account for the Edge capacity and the extra latency in the design. The teams that get bitten are the ones who treat enabling a service as a free checkbox rather than a routing decision.

Keeping east-west genuinely distributed

The performance promise of the Tier-1 is that east-west routing stays distributed on the hosts, and protecting that promise is mostly about restraint. Every stateful service you enable on a Tier-1 pulls the relevant traffic onto a service router on the Edge, so a design that liberally turns on services across many Tier-1s can quietly route far more east-west traffic through the Edge than anyone planned. The Edge becomes a chokepoint for traffic that never needed to leave the hosts, and the symptom is latency and Edge load that nobody can immediately explain because on paper the routing is distributed.

The discipline is to keep Tier-1s distributed-router-only wherever you can and to push to a service router only when a specific stateful requirement genuinely demands it. When you do enable a service, treat it as a capacity decision: account for the Edge resources it consumes and the hairpin latency it introduces, and monitor the service-router load so you notice when a tier is carrying more than you intended. East-west that stays on the hosts is fast and scales with your compute; east-west that you accidentally funnel through the Edge inherits the Edge ceiling. Knowing which of those you are building, per Tier-1, is the whole game.

References

NSX Series · Part 10 of 30
« Previous: Part 9  |  NSX Complete Guide  |  Next: Part 11 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

NSX 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading