Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

NSX 9 Advanced Load Balancer (Avi): Architecture and How It Plugs Into NSX (NSX Series, Part 16)

Avi is the strategic load balancer for NSX 9: a controller control plane and elastic service engines. How it is built, how it integrates with NSX, and how a request flows.

NSX Series · Part 16 of 30

TL;DR · Key Takeaways

  • The NSX Advanced Load Balancer (Avi) is the strategic load balancer for NSX 9. It splits into a Controller control plane and elastic Service Engine data plane.
  • The Controller cluster (three nodes, in the management domain) holds policy, config, and analytics. It does not sit in the data path.
  • Service Engines are the data-plane VMs that actually load-balance traffic, organized into Service Engine Groups that define sizing, placement, and HA.
  • The Controller registers with NSX Manager through a cloud connector, which is how Avi learns the NSX networks and, with VPCs, plumbs itself in automatically.
  • Avi does far more than L4 load balancing: L7, GSLB, WAF, and deep analytics. It is where NSX networking meets application delivery.
Who this is for: architects adding application delivery to an NSX 9 platform.  Prerequisites: a working NSX 9 deployment (Parts 1 to 10). For the Avi-versus-native decision, see my Avi vs NSX Native Load Balancing in VCF 9.

Load balancing in NSX took a clear direction: Avi, the NSX Advanced Load Balancer, is where Broadcom is investing, and the old NSX native load balancer is the legacy path you move off rather than toward. If you have only ever met a load balancer as a pair of appliances in a rack, Avi will feel different, because it is built the way the rest of NSX is, a separated control and data plane that scales elastically. That architecture is the whole story, so this part is about how Avi is built, how it plugs into NSX, and how a request actually travels through it. Get the shape right and everything else, the pools, the policies, the WAF, is detail on top of a sound structure.

Controller and Service Engines

Avi separates the brain from the muscle. The Controller is the control plane: a cluster of three nodes, deployed in the management domain, that stores every configuration, policy, and certificate, runs the API and UI, and aggregates the analytics. Crucially, the Controller is not in the data path, so traffic does not flow through it and a Controller maintenance event does not drop connections, exactly the decoupling you saw with the NSX Manager in Part 2. The Service Engines are the data plane: lightweight VMs that actually receive client traffic on virtual IPs, make the load-balancing decisions, and forward to the backend pool members. The Controller tells the Service Engines what to do; the Service Engines do it. That division is what lets Avi scale out by adding engines instead of forklifting a bigger appliance.

Avi: a control plane and a data plane Avi Controller cluster (3 nodes, mgmt domain) · control plane Policy, config, certs, API/UI, analytics. Not in the data path. registers with NSX Manager programs Service Engine Group · data plane Service Engine Service Engine Service Engine + VIPs live here. Traffic flowsthrough the engines, scaling out.
The Controller decides; the Service Engines forward. Traffic never touches the Controller.
ComponentRoleIn the data path?
Controller clusterControl plane: policy, config, analytics, API.No.
Service Engine (SE)Data plane: hosts VIPs, load-balances traffic.Yes.
Service Engine GroupTemplate: SE sizing, placement, HA, scale.n/a (config).
NSX cloud connectorRegisters Avi with NSX Manager; learns networks.n/a (integration).

How Avi plugs into NSX

Avi does not guess at your network. The Controller registers with NSX Manager through a cloud connector, and from that point it can see NSX segments, gateways, and groups, and place Service Engines onto the right networks automatically. In a VCF 9 deployment the Controllers are deployed by SDDC Manager, three by default for high availability, with a single-Controller option for labs. The integration goes deeper with VPCs (Part 22): when Avi is connected through NSX cloud for VPC integration, the Service Engines attach to the VPC-backed private networks directly, and you do not have to hand-build Tier-1 gateways or segments just to wire the load balancer in. The platform plumbs it for you, which is the kind of integration that turns a separate product into a feature of the platform.

A request through the load balancer client Service EngineVIP, L4-L7 decision pool member 1 pool member 2 Controller programs policy, not in the path
The client hits a VIP on a Service Engine, which picks a healthy backend. The Controller stays out of the flow.
In practice: the Controllers live in the management domain even when the load balancer serves a VI workload domain, and that catches people during recovery. Know where your Controllers actually run before you have to recover them, because an orphaned Controller in VCF 9 is a genuinely awkward thing to clean up after the fact.

More than load balancing

Reducing Avi to “a load balancer” undersells it. The same platform delivers Layer 7 traffic management with content switching and SSL/TLS termination, a web application firewall for application-layer protection, global server load balancing to steer traffic across sites for resilience and locality, and a genuinely strong analytics engine that gives you per-request visibility, end-to-end timing, and health insight that a traditional appliance never exposed. That analytics layer alone changes how teams troubleshoot application problems, because the load balancer becomes an observability point rather than a black box. You do not have to use all of it on day one, but knowing it is there shapes how you design, because capabilities you would otherwise buy as separate products are already in the platform.

CapabilityWhat it gives you
L4-L7 load balancingVIPs, pools, health monitors, SSL termination, content switching.
WAFApplication-layer protection in front of your apps.
GSLBSteer and fail traffic over across sites.
AnalyticsPer-request visibility and end-to-end timing.
Elastic autoscaleAdd Service Engines as load grows; no forklift.

Service Engine HA and scale

Because the Service Engines are the data plane, their availability model is your load balancer’s availability model, and Avi gives you choices the old appliance pair never did. Rather than a simple active and standby box, a Service Engine Group can run several engines and spread virtual services across them, so the loss of one engine moves only the virtual services it was hosting, not the whole load balancer. As traffic grows, Avi can scale a single virtual service across multiple engines and add engines elastically, which means you grow capacity by adding small data-plane VMs instead of replacing a chassis. The Controller orchestrates all of this from outside the data path, placing and re-placing virtual services on engines as health and load change. This is the elasticity that makes Avi feel like a cloud service rather than a box.

Virtual services spread across engines SE 1vs-webvs-api SE 2vs-appvs-web SE 3 (failed)its services moved left One engine fails, only its virtual services relocate. Scale a busy service across several engines.
Availability and capacity are properties of the Service Engine Group, not a fixed appliance pair.

Designing the Service Engine Group

The Service Engine Group is where most of the real design effort goes, because it sets the sizing, placement, HA model, and scale limits for the engines that serve a set of virtual services. The first decision is shared versus dedicated. A shared SE Group is efficient, many applications served by a common pool of engines, and right for the bulk of internal workloads. A dedicated SE Group gives a sensitive or noisy application its own engines, which buys isolation and predictable performance at the cost of more VMs, and it is the pattern I use for a regulated tenant or an application whose traffic spikes would otherwise hurt its neighbours. The second decision is placement: which cluster the engines run on, and the resource reservations they get, because, exactly like the Edge VMs in Part 7, a Service Engine starved of CPU on a contended cluster degrades in ways that look like an application fault.

My take: share Service Engine Groups by default and go dedicated only for a named reason, the same instinct I apply to NSX clusters and Tier-1s throughout this series. Reserve the engines’ resources, and treat the SE Group as the unit of load-balancer capacity planning, because that is exactly what it is.

What I’d Do

Treat Avi as the default load balancer for any new NSX 9 work, and plan a migration off the native LB rather than building anything new on it. Deploy the three-node Controller cluster for production and keep the single-Controller option strictly for labs. Get the Service Engine Group sizing and placement right up front, because that is where load-balancing capacity and HA actually live, and let the NSX cloud connector and VPC integration do the network plumbing instead of hand-building Tier-1s for the load balancer. Lean on the analytics from day one; it is one of the genuine advantages over a traditional appliance. And always know that your Controllers live in the management domain, so recovery does not surprise you. For the head-to-head with the native option, see Avi vs NSX Native Load Balancing in VCF 9. Next up is Part 17: VPN with IPSec and L2VPN, the last of the Edge services. Are you building new on Avi, or still adding to the native load balancer you will have to migrate?


Designing the Avi integration so it scales

Avi is powerful, and the power comes from a clean split between a control plane and a data plane. Get that split clear in your head and the sizing, placement and the native-versus-Avi decision all fall out of it.

Controller and Service Engines: where the load actually goes

The Avi Controller is the brain: configuration, the analytics engine, the API, the single management surface. You size and place it for management resilience, not for traffic, because no application data flows through it. The Service Engines are the data-plane VMs that actually load-balance traffic, and they scale horizontally as you add capacity. Almost all of your throughput and capacity planning is really Service Engine planning. The classic early mistake is treating the Controller as the thing that carries load and undersizing the Service Engines; it is the SEs that need the headroom, the SEs that fail over, and the SEs that you add when a virtual service outgrows its current footprint.

How it plugs into NSX

Avi integrates with NSX as a cloud connector, consuming NSX networks and placing Service Engines directly onto them so the load-balanced data path lives alongside the workloads it serves. That integration is what lets Avi present virtual services on NSX segments and inherit the same VPC, segmentation and routing model as everything else in the platform. It is not a bolt-on sitting off to the side; done properly, the load balancer is a first-class citizen of the same NSX networking you designed in the earlier Parts, which keeps the traffic path and the security policy coherent.

When NSX native load balancing is enough

Not every workload earns Avi. NSX has native load balancing on the Edge that covers straightforward Layer 4 and Layer 7 virtual services, and for those, adding Avi is complexity and cost you do not need. Reach for Avi when the requirement is genuinely advanced: rich per-application analytics, a web application firewall, global server load balancing across sites, or autoscaling Service Engines under bursty load. Stay native when the requirement is a basic virtual service in front of a pool. The honest decision is requirement-led, not fashion-led, and pretending every app needs the full Avi feature set is how platforms accumulate cost they cannot justify at the next budget review.

Control plane up, data plane on the wire Size the Controller for management, the Service Engines for traffic. Avi Controller (no data path) Service Engineon NSX segment Service Enginescales horizontally Backend poolworkload VMs
Diagram 3: The Controller manages; the Service Engines carry the traffic on NSX segments to the pool. Capacity planning is SE planning.
In practice: when a virtual service starts dropping connections under load, the instinct is to blame the Controller. It is almost never the Controller. Look at Service Engine capacity and placement first, because that is where the data path lives and where the bottleneck actually forms. Add or resize SEs, and the problem usually goes away without the Controller ever being involved.

Operating Avi day to day

Once Avi is in place, the operational model is analytics-first, and that is genuinely different from running a basic load balancer. The Controller collects rich per-application telemetry, so when a virtual service misbehaves you are not guessing from sparse logs; you are looking at end-to-end timing, connection health and error breakdowns that point at where the problem actually sits, whether that is the client side, a Service Engine, or the backend pool. Learning to trust and read that analytics view is the highest-leverage Avi skill, because it turns load-balancer troubleshooting from folklore into evidence.

Failure handling follows the same control-plane and data-plane split. A Service Engine failure is handled by the data plane redistributing virtual services across the remaining engines, so you plan capacity so the surviving engines can absorb the load rather than assuming every engine is always available. Scaling is horizontal: when a virtual service outgrows its footprint you add or resize Service Engines, not the Controller. Keep the Controller healthy and well-backed-up because it holds the configuration and the analytics, but remember that day-to-day capacity and resilience are decided at the Service Engine layer. Operate Avi from the analytics, size at the engines, and protect the Controller, and it behaves predictably under real load.

References

NSX Series · Part 16 of 30
« Previous: Part 15  |  NSX Complete Guide  |  Next: Part 17 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

NSX 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading