Dr. Pranay Jha

VMware • Cloud • AI • Enterprise Architecture

FORMERLY
VMware Insight & Cloud Pathshala
What began over a decade ago as a passion for sharing knowledge has evolved into a unified platform for Enterprise AI, VMware, Cloud Architecture, Research, and Modern Infrastructure.
,

NSX 9 Distributed Firewall Fundamentals: Categories, Applied-To and Zero Trust (NSX Series, Part 12)

The Distributed Firewall puts stateful enforcement at every vNIC. Rule categories and order, the anatomy of a rule, why Applied-To matters most, and the zero-trust pivot.

NSX Series · Part 12 of 30

TL;DR · Key Takeaways

  • The Distributed Firewall is a stateful firewall in the hypervisor kernel, enforced at every VM vNIC. It is distributed across every host and the policy follows the VM on vMotion.
  • Rules are organized into ordered categories: Ethernet, Emergency, Infrastructure, Environment, Application, evaluated top to bottom, category by category, then the default rule.
  • Applied-To is the field that matters most. It scopes where a rule is enforced. Get it wrong and you change behaviour far beyond the VMs you meant to.
  • Host prep activates the DFW with a default rule of allow. Moving to zero trust means flipping that to deny, but only after the explicit allow rules are in place.
  • The DFW is a vDefend feature (Part 3), not part of the base VCF entitlement. It is the main reason most teams license vDefend at all.
Who this is for: security and network architects starting micro-segmentation on NSX 9.  Prerequisites: segments and groups in place (Part 8), and a vDefend license (Part 3), because the DFW is a vDefend feature.

This is the part of the series people came for. Micro-segmentation is the headline reason most organizations buy NSX, and the Distributed Firewall is how it happens. It is also the most unforgiving object in the platform. The DFW puts a stateful firewall at every virtual NIC in your data center, which is enormous power, and the same power means a single careless rule can sever traffic for a thousand VMs in one publish. I have watched a confident engineer take out a production segment with one rule that had the wrong Applied-To, in front of the customer. So we are going to go slow here, because understanding the DFW model before you write a rule is the difference between security and an outage.

What the DFW is, and where it runs

The Distributed Firewall is not an appliance and not a chokepoint. It is enforcement code in the ESXi kernel, applied at the virtual interface of each VM, on every prepared host. When a packet leaves a VM, the DFW evaluates it right there at the vNIC before it ever reaches the segment, and the same happens on the way in. Because it lives on every host, it scales with your compute, there is no central box to overload, and because the policy is attached to the workload rather than to a physical location, it moves with the VM when it vMotions. East-west traffic between two VMs is filtered without ever leaving the host they share. That is what makes micro-segmentation practical: the firewall is everywhere at once.

A firewall at every vNIC, on every host ESXi host (DFW in kernel) vm A DFW vm B Filtered at the interface, before the segment. Another host vm C Same policy here. It follows vm C if it vMotions to a different host.
The DFW is enforced at each vNIC in the kernel. The policy is bound to the workload, not the location.

Rule categories and evaluation order

DFW rules are not one flat list. They are organized into ordered categories, and the order is fixed: Ethernet, Emergency, Infrastructure, Environment, Application. The firewall evaluates them left to right, top to bottom within each, and the first matching rule wins. Understanding this order is what stops your rules fighting each other. Emergency is where you put a quarantine or a break-glass block that must beat everything below it. Infrastructure is for the shared platform services every workload needs, DNS, NTP, AD, backup. Environment separates broad zones like production from development. Application is where the specific app-tier rules live, the web-to-app-to-database micro-segmentation that is the whole point. Put a rule in the wrong category and it either never matches or matches before something it should not.

Categories evaluate in order; first match wins EthernetL2 Emergencyquarantine InfrastructureDNS, NTP, AD Environmentprod vs dev Applicationapp tiers Defaultallow/deny Put each rule in the category that matches its job. The default rule is the last word if nothing else matches.
Five categories, fixed order, first match wins. Category choice is part of rule design, not an afterthought.
CategoryPurposeTypical rules
EthernetLayer 2 filtering.MAC-based rules (rare).
EmergencyBreak-glass, highest precedence.Quarantine a compromised VM.
InfrastructureShared platform services.Allow DNS, NTP, AD, backup.
EnvironmentBroad zone separation.Block prod from dev.
ApplicationPer-app micro-segmentation.Web to app to database tiers.

Anatomy of a rule, and why Applied-To matters most

A DFW rule has the fields you would expect, source, destination, service, action, and logging, plus one that NSX newcomers underestimate every time: Applied-To. Source, destination, and service decide whether a packet matches the rule. Applied-To decides where the rule is enforced, that is, on which VMs’ vNICs NSX actually installs it. These are different questions, and conflating them is the single most expensive DFW mistake. If you leave Applied-To at the default of DFW, the rule is pushed to every VM in the deployment. Scope it to a security group, and it is installed only on those VMs. Applied-To is both a performance control, fewer rules per vNIC, and a blast-radius control, a mistake stays contained.

FieldWhat it controls
Source / DestinationWhich traffic matches (groups, IPs, segments).
ServicePorts and protocols, or an app-ID service.
Applied-ToWhere the rule is enforced. Scope it to a group, not the whole DFW.
ActionAllow, Drop, or Reject.
LoggingOn or off (rate-limited per host); essential for tuning.
Applied-To is your blast-radius dial Applied-To: DFW (everywhere) A mistake here hits every VM. Applied-To: a group (scoped) Enforced only on the target group (red).
Same rule, two Applied-To choices. Scoping it is how a mistake stays small.
In practice: the worst DFW outage I have seen was a single deny rule with Applied-To left at DFW instead of the intended app group. It published to every vNIC in the environment and dropped traffic for workloads that had nothing to do with the change. Always scope Applied-To. Treat the default of DFW as a loaded gun.

The default rule and the zero-trust pivot

When NSX prepares your hosts, the DFW comes up with a default rule of allow, so that turning on the firewall does not instantly break every VM-to-VM flow. That is a sensible starting point, but it is the opposite of zero trust. The goal of micro-segmentation is a default rule of deny, where nothing is permitted unless an explicit rule allows it. The pivot from allow to deny is the most consequential change you will ever make in the DFW, and the order is everything: you build and validate all the explicit allow rules first, confirm with logging that real traffic is matching them, and only then flip the default to deny. Flip it first and you black-hole the data center. This migration is a project, not a checkbox, and it is the subject of Part 21.

Disclaimer: changing the DFW default rule to deny will drop any traffic you have not explicitly allowed. Build and verify the allow rules with logging first, stage on a non-production scope, and have a rollback (set the default back to allow) ready. Confirm a vDefend license is in place before relying on the DFW in production.

Stateful by default

The DFW is stateful out of the box, which means you write a rule for the connection’s initiating direction and the return traffic is allowed automatically, the same model every modern firewall uses. You can make a policy section stateless if you have a specific reason, but you rarely should, because stateless rules force you to hand-write return rules and they lose the connection tracking that makes the firewall both safe and simple. Leave it stateful unless something concrete demands otherwise. Combined with the categories and Applied-To, statefulness is what lets you express security as intent, allow web to app on 8443, rather than as a pile of bidirectional port rules you have to maintain by hand.

Verifying the DFW on the host

Because the DFW is enforced in the kernel, the truth about what is actually installed on a VM lives on the host, not just in the UI. When a rule does not behave as you expect, drop to the host and look at what the datapath really has. The vsipioctl tooling shows the rules and connection state for a given virtual interface, and it is how you confirm whether your Applied-To landed where you intended.

# On the ESXi host, find the VM's filter (vNIC), then read its rules
summarize-dvfilter            # list filters; find your VM's vNIC filter name
vsipioctl getrules -f <filter-name>     # the rules actually installed on that vNIC
vsipioctl getflows -f <filter-name>     # live connection state for that vNIC

# If a rule you expect is missing here, the Applied-To did not include
# this VM. If an unexpected rule is present, your scoping is too broad.

This single check settles most DFW arguments. If the rule is on the vNIC and traffic still drops, the rule logic is wrong. If the rule is not on the vNIC at all, the Applied-To or the group membership is wrong. Knowing which of those two it is saves you from changing the right rule for the wrong reason.

The DFW mistakes that cause outages

Almost every serious DFW incident I have been called into traces back to a small set of mistakes. None of them are exotic; they are the predictable ways a powerful tool gets misused under time pressure. Keep this list where you can see it before you publish.

MistakeResultAvoid it by
Applied-To left at DFWRule pushed to every vNIC; wide blast radius.Scope Applied-To to a group, always.
Default flipped to deny too earlyBlack-holes traffic with no allow rules.Build and verify allow rules first.
Rule in the wrong categoryMatches too early or never matches.Match category to the rule’s job.
No logging while tuningYou cannot see what real traffic needs.Log allow rules during the build phase.
IP-based rules that never updateRules drift as workloads change.Use dynamic groups and tags (Part 13).
My take: the DFW rewards discipline and punishes improvisation more than any other NSX object. The teams that run it well treat every rule change like a code change, with review, scoping, and a rollback. The ones that treat it like editing a spreadsheet are the ones I get called to rescue.

What I’d Do

Learn the model before you write a rule. Put each rule in the right category, scope Applied-To to a group every single time, and treat the default of DFW as the dangerous setting it is. Keep the firewall stateful, use logging to validate that real traffic matches your allow rules, and treat the move from default-allow to default-deny as a staged project with a rollback, never a one-click flip. Above all, internalize that the DFW’s power and its danger are the same property: it is everywhere at once, so a good rule protects everything and a bad rule breaks everything. Respect that and micro-segmentation becomes the strongest control in your data center. Next up is Part 13: security groups, tags, and dynamic membership, the engine that makes these rules describe workloads instead of IP addresses. Is your Applied-To scoped on every rule you have written so far?


Default deny is a destination, not a starting point

The distributed firewall ends in a default rule, and the instinct of every security-minded engineer is to flip that default to deny as fast as possible. Resist it. Flipping to default-deny without a real picture of how your applications actually communicate is the single fastest way to take down production, because the rules you have not written yet are exactly the flows you did not know existed. The category model and the ordered evaluation give you the structure to do this safely, but structure does not substitute for knowing the traffic.

The path that works is the one the micro-segmentation methodology lays out: discover the flows, build allow rules from what you actually observe, validate them in a monitoring posture, and only then tighten the default. Default-deny is the destination you arrive at after you have earned the visibility, not the switch you throw on day one. Along the way, the discipline that prevents most self-inflicted outages is precise applied-to scoping and deliberate rule ordering within categories, so a rule lands exactly where you intend and is evaluated in the order you expect. Get there gradually and default-deny is a milestone you celebrate; get there in one reckless step and it is an incident you explain.

Section design keeps a large rule base readable

As the distributed firewall grows, the thing that keeps it operable is not cleverness in individual rules but structure across them. Organizing rules into clear policy sections, per application, per zone, per purpose, is what lets a human still reason about a rule base that has grown to thousands of entries. A flat list of rules with no sectioning is unauditable the moment it gets large, while a well-sectioned table reads like a document with chapters, where you can find the rules that govern a given application without scrolling through everything else.

Good section structure also makes applied-to scoping and change review tractable, because a section carries intent that an individual rule cannot. When you add a new application, it gets its own section in the application category, with its rules scoped to its workloads, and the rest of the table is untouched and unaffected. Curate the sections as deliberately as the rules, and the firewall stays something a new team member can read and a security auditor can follow, rather than a sprawling artifact that only its original author could ever explain.

References

NSX Series · Part 12 of 30
« Previous: Part 11  |  NSX Complete Guide  |  Next: Part 13 »

About The Author


Discover more from Dr. Pranay Jha

Subscribe to get the latest posts sent to your email.

Leave a Reply

Your email address will not be published. Required fields are marked *

Architect’s Toolkit

About the Author

Dr. Pranay Jha is a Cloud and AI Consultant with 18+ years of experience in hybrid cloud, virtualization, and enterprise infrastructure transformation. He specializes in VMware technologies, multi-cloud strategy, and Generative AI solutions. He holds a PhD in Computer Applications with research focused on Cloud and AI, has published multiple research papers, and has been a VMware vExpert since 2016 and a VMUG Community Leader.

NSX 9 Series

Discover more from Dr. Pranay Jha

Subscribe now to keep reading and get access to the full archive.

Continue reading