TL;DR · Key Takeaways
- NSX runs site-to-site IPSec VPN and L2VPN on the Edge service router, on a Tier-0 or Tier-1 gateway. Like all SR services, they need an Edge cluster.
- Policy-based IPSec defines the interesting traffic as local and remote subnets. Route-based IPSec builds a virtual tunnel interface (VTI) and routes over it, with static routes or BGP.
- Route-based is the better default: it scales, supports dynamic routing, and survives subnet changes without editing the tunnel. Reach for policy-based only when the far end demands it.
- L2VPN stretches Layer 2 across sites, extending a segment over the WAN. It rides on a route-based IPSec tunnel and is the tool for cross-site migration and DR.
- In an NSX Project (multi-tenancy), a Tier-1 supports one IPSec and one L2VPN service, and project Tier-1s use static routes, not VTI BGP.
VPN is the part of NSX that reaches outside the data center, to a branch, a partner, another cloud, or a second site. It is also where two worlds meet: your NSX configuration on one end and someone else’s firewall or router on the other, and that far end is frequently not yours to change. That constraint shapes every VPN design, because you are negotiating a shared secret and a set of parameters with a device you do not control. NSX gives you flexible, capable VPN on the Edge, but the wins here come from choosing the right model and matching parameters cleanly with the far end, not from anything exotic. So this part is about the two IPSec models, when to use each, and what L2VPN is genuinely for.
IPSec on the Edge
An NSX IPSec VPN service runs on a Tier-0 or Tier-1 gateway and builds an encrypted, authenticated tunnel across an untrusted network to a remote endpoint. Because it is a stateful service, it lives on the service router, which means the gateway needs an Edge cluster, the same rule that governed NAT and the gateway firewall. The session is defined by a stack of profiles that both ends must agree on: an IKE profile for the key-exchange parameters, an IPSec profile for the tunnel encryption, and a DPD profile for dead-peer detection. The single most common reason a tunnel will not come up is a mismatch in one of these between your end and the far end, so the practical work of building an IPSec VPN is largely the discipline of agreeing every parameter with the other side before you start.
Policy-based vs route-based
The fork that shapes the whole design is how the tunnel decides which traffic to carry. Policy-based IPSec defines the interesting traffic explicitly, as a set of local and remote subnet pairs, and the tunnel encrypts traffic matching those selectors. It is simple and it interoperates with almost anything, but it is rigid: every time a subnet on either side changes, you edit the policy, and large numbers of subnet pairs get unwieldy. Route-based IPSec instead creates a virtual tunnel interface (VTI), a logical interface you route traffic into, and then you decide what crosses the tunnel using ordinary routing, static routes or BGP. That indirection is the whole advantage: the tunnel does not care about subnets, the routing table does, so you add or remove networks by changing routes, not by editing the VPN, and you can run dynamic routing across the tunnel for resilience.
| Dimension | Policy-based | Route-based |
|---|---|---|
| Defines traffic by | Local/remote subnet selectors. | Routes into a VTI. |
| Dynamic routing | No. | Yes, BGP over the tunnel. |
| Adding networks | Edit the VPN policy. | Add a route; tunnel unchanged. |
| Best for | Simple, fixed, when the far end requires it. | Almost everything. The default. |
L2VPN: stretching Layer 2
IPSec connects networks that keep their own addressing; L2VPN does something stronger and more specialized: it extends a single Layer 2 segment across two sites, so a VM at the remote site sits on the same broadcast domain and subnet as VMs in your data center, as if the wire were stretched over the WAN. It is built on a route-based IPSec tunnel, with an L2VPN server at one end and a client at the other. The reason this matters is migration and disaster recovery. When you move workloads between sites and cannot re-IP them, or you need a DR site where VMs come up with their production addresses intact, stretching Layer 2 buys you that continuity. It is a powerful tool and a deliberately temporary one: stretched Layer 2 across a WAN is something you use to get through a migration or a failover, not a steady-state design you want to live on forever.
Which VPN for which job
The choice between the three options, policy-based IPSec, route-based IPSec, and L2VPN, falls out cleanly once you name the actual requirement. Most steady-state connectivity, a branch, a partner, another cloud, wants route-based IPSec. Policy-based is what you settle for when the far-end device forces it. L2VPN is reserved for the specific case where addressing must be preserved across sites, which in practice means migration and disaster recovery. The table makes the decision a one-liner.
| You need to | Use | Why |
|---|---|---|
| Connect a branch or partner site | Route-based IPSec | Scales, dynamic routing, survives change. |
| Interop with a fixed legacy device | Policy-based IPSec | The far end only supports selectors. |
| Migrate VMs without re-IP | L2VPN | Stretches the subnet; retire it after. |
| Stand up a DR site at the same IPs | L2VPN | VMs come up on production addresses. |
When a tunnel will not come up
IPSec failures are frustrating because the two ends rarely tell you the same story, and the side that logs the useful error is often the one you do not control. Work the checklist methodically rather than guessing. The phase-one failures are key-exchange mismatches in the IKE profile; the phase-two failures are tunnel-encryption mismatches in the IPSec profile; and the tunnels that establish but pass no traffic are almost always a routing or selector problem, or an MTU problem that only shows up under load. Walk these in order and you resolve the large majority of VPN tickets without a packet capture.
| Symptom | Likely cause | Check |
|---|---|---|
| Never reaches phase one | IKE mismatch or wrong peer/pre-shared key | IKE version, DH group, encryption, PSK. |
| Phase one up, phase two fails | IPSec profile mismatch | Encryption, PFS, lifetime on both ends. |
| Tunnel up, no traffic | Routing or selectors wrong | VTI routes, or policy subnet pairs. |
| Small flows work, large hang | MTU not accounting for IPSec overhead | Lower effective MTU or clamp MSS. |
What I’d Do
Make route-based IPSec your standard and treat policy-based as the compatibility fallback for a far end that cannot do better. Agree every IKE, IPSec, and DPD parameter with the other side before you touch the config, because a clean parameter sheet prevents the great majority of tunnel-down tickets, and budget for the MTU overhead that encryption adds. Use L2VPN deliberately and temporarily, to carry a migration or stand up a DR site without re-IP, and plan its retirement as part of the project rather than letting stretched Layer 2 become permanent. And remember these are SR services, so the gateway hosting them needs an Edge cluster, and in a multi-tenant Project each Tier-1 is limited to one IPSec and one L2VPN with static routing only. Next up is Part 18: monitoring and operations with Traceflow, alarms, and Operations for Networks, where we shift from building to running. Are your tunnels route-based, or are you still editing subnet selectors by hand?
Route-based VPN usually beats policy-based
When you stand up an IPSec tunnel you choose between policy-based and route-based, and for most designs route-based is the better answer. Policy-based IPSec matches traffic against selectors, lists of interesting source and destination networks, which is fine for a handful of static subnets and becomes brittle as soon as the topology grows or changes. Route-based IPSec uses a virtual tunnel interface and ordinary routing, which means the tunnel participates in your routing design and scales with it instead of fighting it. You add a network behind the tunnel and routing carries it, rather than editing selector lists on both ends and hoping they stay in sync.
The practical payoff is that route-based tunnels integrate with dynamic routing, so a route-based IPSec connection can carry BGP and adapt as networks come and go, which is exactly what you want for anything beyond a trivial site-to-site link. Policy-based still has its place for simple, static connections to a third party that insists on it, but as a default for your own multi-site connectivity, route-based is the design that ages well. L2VPN, which stretches Layer 2 across sites, is a separate tool entirely and one to use sparingly, because extending a broadcast domain across a WAN carries all the fault-domain risks that stretched networking always does.
VPN lives on the Edge, and so does its MTU math
Every IPSec and L2VPN tunnel terminates on the Edge service router, which has two consequences worth designing for. The first is capacity: VPN is a stateful service running on the Edge, so the encryption and tunnel processing consume Edge resources, and a design with many tunnels or high VPN throughput is really an Edge sizing exercise. Plan the Edge for the VPN load the way you would plan it for any other service, and remember that the same Edge may be carrying north-south forwarding and other services at the same time.
The second consequence is the one that generates support tickets: MTU. IPSec adds its own encapsulation overhead, and when that tunnel rides over an NSX overlay that already added GENEVE overhead, the packet sizes stack up and fragmentation or silent drops follow if the path MTU is not right. The familiar small-works-large-fails signature from the overlay troubleshooting Part applies here too, just with an extra layer of headers. Account for the IPSec overhead in your MTU planning end to end, test with full-size packets across the tunnel, and you avoid the classic VPN-that-pings-but-will-not-move-data problem that the encapsulation overhead quietly creates.
Build redundancy into site-to-site links
A single tunnel between two sites is a single point of failure dressed up as connectivity, and for anything that matters you design the link to survive a failure. That means more than one tunnel, terminating in a way that a single Edge failure or a single path outage does not drop the connection, with routing that fails the traffic over cleanly. This is another place where route-based VPN earns its keep, because failover between route-based tunnels is a routing decision that the network makes for you, where policy-based selectors would leave you reconfiguring under pressure.
Plan the redundancy against the actual availability requirement of what crosses the link. A development connection to a partner may genuinely be fine on a single tunnel; a production replication or a critical integration is not. Match the tunnel redundancy and the Edge placement to that requirement, test the failover deliberately rather than discovering it during an incident, and remember that the link is only as resilient as its least redundant component, which is often the Edge it terminates on. Redundant tunnels over a single Edge are not redundant where it counts.
References
- Add an NSX IPSec VPN Service (Broadcom TechDocs, VCF 9)
- VPN in an NSX Project (Broadcom TechDocs, VCF 9)
- NSX 9 Tier-0 Gateways and North-South Routing (NSX Series, Part 9)



