TL;DR · Key Takeaways
- vSAN ESA is the default storage layer in VCF 9, and it has strict hardware rules: certified NVMe TLC devices, no RAID controllers, tri-mode controllers, or HBAs in the path.
- Default to RAID-5/6 erasure coding, not RAID-1 mirroring. On ESA, erasure coding matches or beats mirror performance while using far less capacity.
- Size on usable capacity, not raw: budget for FTT overhead plus operations and host-rebuild slack, and do not pre-spend deduplication or compression ratios.
- Keep hosts uniform, give vSAN a 25GbE or faster network, and scale out by adding hosts rather than stuffing disks into a few.
- Let VCF 9.1 Automated Storage Policy Management pick the optimal fault tolerance and erasure coding for your cluster size.
Storage is where VMware Cloud Foundation deployments quietly go wrong. The cluster comes up healthy, workloads run, and then six months later capacity is tighter than the spreadsheet promised, a rebuild stalls during a host failure, or latency spikes under load. Almost every one of those problems traces back to a design decision made before the first host was racked. In VCF 9, vSAN Express Storage Architecture (ESA) is the default and recommended storage layer, and it changes several rules that experienced OSA admins still apply out of habit. This post walks through six storage design pitfalls we see most often, and the fix for each.
1. Treating ESA like OSA on the hardware side
ESA is not OSA with faster disks. There are no disk groups and no separate cache tier: every claimed device contributes to both capacity and performance in a single storage pool per host. That design only works on the hardware it was built for. ESA requires certified NVMe TLC flash devices, and it is not supported behind RAID controllers, tri-mode controllers, or HBAs. Plugging ESA-class NVMe into a server with a RAID controller in the path is one of the most common ways a build fails certification.
- Build from a vSAN ESA ReadyNode, or validate every component against the VMware Compatibility Guide before you buy.
- Confirm the devices are on the ESA-specific HCL, not just the general vSAN list.
- Verify the controller path is direct-attached NVMe with no RAID or tri-mode layer in between.
Verify what each host actually presents before you enable the cluster:
# List devices vSAN can claim and their type
esxcli vsan storagepool list
# Confirm the architecture the host is running
esxcli vsan cluster get
# Check the storage controller in the path
esxcli storage core adapter list
2. Defaulting to RAID-1 mirroring
On OSA, RAID-1 mirroring was the go-to for performance and RAID-5/6 was the capacity-saving compromise you reached for reluctantly. ESA flips that trade-off. Because of its log-structured design, ESA delivers erasure-coding performance that is equal to or better than mirroring, so there is no longer a performance reason to default to RAID-1. The capacity difference is large: at FTT=1, a mirror consumes roughly 2x the raw capacity of the data, while RAID-5 consumes about 1.33x. Choosing mirroring out of habit can silently waste a third or more of your usable storage.
- Use RAID-5/6 erasure coding as the default for general workloads on ESA.
- Reserve RAID-1 for the rare cases where a specific workload genuinely needs it.
- On VCF 9.1, enable Automated Storage Policy Management so vSAN applies the highest fault tolerance and optimal erasure coding for the cluster size automatically.
3. Sizing on raw capacity and pre-spending data reduction
Raw capacity is not usable capacity. After you account for the FTT overhead from your storage policy, you still need to reserve slack for vSAN itself: an operations reserve for internal tasks and a host-rebuild reserve so the cluster can re-protect data when a host fails. A cluster sized to the edge of raw capacity cannot complete a rebuild, which turns a single host failure into a capacity emergency. The second half of this trap is pre-spending data reduction: global deduplication can reach up to 8x and the new VCF 9.1 compression improves ratios further, but those numbers are workload-dependent and must never be baked into your primary capacity plan.
- Drive sizing with the vSAN ReadyNode Sizer, ideally fed with real usage data rather than guesses.
- Plan to usable capacity after FTT, operations reserve, and host-rebuild reserve, and leave headroom on top.
- Treat deduplication and compression as a bonus that lowers cost, not as capacity you can commit to in advance.
For how usable capacity ties back to what you are entitled to consume, see the vSAN TiB entitlement discussion in VCF 9 Licensing Explained, and review your inputs against the readiness checklist in VCF 9 Planning and Prerequisites.
4. Mixing host configurations in one cluster
vSAN works best when every host in a cluster has a similar or identical configuration, especially storage. Mixed device counts or capacities create an unbalanced datastore where some hosts fill up faster than others, components cluster on the larger nodes, and rebuild and rebalance behavior becomes unpredictable. The convenience of adding whatever hardware is on hand costs you in lopsided utilization and harder troubleshooting later.
- Standardize on a single host specification per cluster: same device class, count, and capacity.
- When you must introduce a newer node spec, build a new uniform cluster rather than diluting an existing one.
- Keep CPU and memory aligned too, so storage policy outcomes and DRS behavior stay predictable.
5. Under-provisioning the vSAN network
vSAN is a distributed storage system, so the network is part of the storage subsystem, not an afterthought. ESA moves more data across the fabric and expects a fast, low-latency network: plan for 25GbE or faster, with redundant uplinks and jumbo frames configured consistently end to end. A 10GbE network carried over from an older cluster, or an MTU mismatch somewhere in the path, shows up as latency and rebuild slowness that looks like a storage problem but is really a network one.
- Provision 25GbE or faster for ESA, with redundancy at the NIC and switch layers.
- Set MTU 9000 consistently across vmknics, switches, and uplinks, then validate it actually passes.
- Separate or prioritize vSAN traffic with Network I/O Control so it is not starved by vMotion or workload traffic.
Network design and storage design are tightly coupled, so pair this with the fabric guidance in VCF 9 Network Design: 7 Mistakes That Break Your Deployment.
6. Scaling up disks instead of scaling out hosts
When capacity runs low, the tempting move is to add disks to the existing hosts. vSAN prefers the opposite: scaling out by adding hosts is the recommended approach over adding or replacing devices in existing nodes. More hosts means more failure domains, more aggregate performance, and a larger pool to absorb a rebuild. Concentrating capacity on a few dense hosts increases the blast radius of a single host failure and can leave too little headroom to re-protect data. If you genuinely need to separate storage growth from compute, VCF 9 supports disaggregation through vSAN storage clusters rather than over-stuffing HCI nodes.
- Grow capacity by adding hosts first; add devices to existing hosts only when it keeps the cluster uniform.
- Use a vSAN storage cluster when storage and compute need to scale independently.
- Keep enough hosts that losing one still leaves room to rebuild within policy.
Final Thoughts
None of these pitfalls are exotic. They are the result of carrying OSA-era habits into an ESA-first platform, sizing optimistically, and treating the network as separate from storage. Get the hardware certified, default to erasure coding, size on usable capacity with real slack, keep hosts uniform, give vSAN the network it needs, and scale out rather than up. Do that, and the storage layer becomes the part of your VCF 9 deployment you stop worrying about, which is exactly where it should be.
References
- Optimize, Modernize and Protect Your Private Cloud Storage with vSAN in VCF 9.1 (VMware Cloud Foundation Blog)
- vSAN Concepts, VMware Cloud Foundation 9 (Broadcom TechDocs)
- VMware vSAN Design Guide
« Previous: VCF 9 Network Design: 7 Mistakes That Break Your Deployment (Part 5)
Next: VCF 9 Reference Architecture (Part 7, coming soon)
Back to the VCF 9 Complete Guide.



