VSAN Part 25 – How many hosts needed to tolerate failures?

This is a question that has come up a number of times. Many of you will now be familiar with the VM Storage Policy capability Number Of Failures To Tolerate for VSAN, which defines how many failures can occur in the VSAN cluster and still provide a full copy of the data to allow a virtual machine to remain available. In this short post, I will explain how many physical ESXi hosts you need to accommodate the Number of Failures To Tolerate requirement in the VM Storage Policy.

The formula is actually quite simple. If you want to tolerate n failures, then you need 2n + 1 ESXi hosts in the VSAN cluster. This table should make it clearer:

Number of Failures To Tolerate    Number of hosts in the Cluster
          1                                      3
          2                                      5
          3                                      7

At this time, the maximum value that you can set for Number of Failures to Tolerate is 3, even though we can up to 32 hosts in a VSAN cluster.

So what happens if you try to deploy a virtual machine that has a VM storage policy with a particular Number of Failures to Tolerate capability but the VSAN Cluster does not contain enough ESXi hosts to satisfy this request? The following messages is displayed when you try to do a deploy the VM (this example was trying to deploy a VM with Number of Failures to Tolerate set to 2 on a VSAN cluster with only 4 hosts).

VSAN - Policy Required 5 hostsSo as you can see it is handled quite well, with a very informative error message.

    • Correct, but you must realize that you will not get the desired “Failures To Tolerate” capability specified in the VM Storage Policy if you do that.

  1. If each host provides more capacity than the entire vsan datastore consumes, shouldn’t you be able to set it up as n+1 nodes? e.g. All VMs consume together only 2TB of storage and each host has 3TB of internal capacity, one host could maintain a complete copy of all data in the event that the other host(s) fail. CPU and RAM might be a problem, but data integrity is intact.

    • You do not control how much storage the VSAN datastore consumes. All disks that are empty and marked as local will be claimed by VSAN when it is enabled on the cluster. You can only control the availability via the “Failures To Tolerate” attribute which configured RAID-1 on VMDKs across hosts in the cluster. This is an important distinction as availability is not based on hosts, but rather on a per VM basis.

Comments are closed.