In this next post, I will examine some failure scenarios. I will concentrate of ESXi host failures, but suffice to say that a disk or network failure can also have consequences for virtual machines running on VSAN. There are two host failure scenarios highlighted below which can impact a virtual machine running on VSAN:
- An ESXi host, on which the VM is not running but has some of its storage objects, suffers a failure
- An ESXi host, on which the VM is running, suffers a failure
Let’s look at these failures in more detail.
Let’s take the simplest configuration; 3 node cluster with a VM deployed with the default policy of ‘FailuresToTolerate = 1‘.
In the first failure scenario, assume that one ESXi host (node) in the VSAN cluster suffers a crash/failure. The ESXi host on which the virtual machine is running is unaffected, therefore the VM itself continues to run. Well, what about if that ESXi hosts that failed held some of the VM’s storage objects? Not an issue, since there will be a full copy of the VM’s storage objects available elsewhere in the cluster. This is because all VMs deployed on VSAN have a ‘FailuresToTolerate = 1’ by default, meaning the VM can tolerate at least one (host/disk/network) failure by nature of having a mirror (RAID-1) configuration.
A reconstruction of the replica storage objects that resided on the failed node is started after a timeout period of 60 minutes (this will allow enough time for host reboots, short periods of maintenance, etc). Once the reconstruction of the storage objects is completed, the cluster directory service is updated with information about where the VM’s storage objects reside in the cluster. There is a video of this behaviour here.
If we look at the second scenario where the virtual machine was running on the failing ESXi host, then vSphere HA (which inter-operates with VSAN) will restart the virtual machine on a remaining host in the VSAN cluster if configured to do so. Even if that ESXi host which failed also contained storage object replicas, there should still be at least one full mirror/replica of the virtual machines storage objects in the cluster. Again, as before, a reconstruction of the storage objects that used to reside on the failed node is started after a timeout period. There is another video of this behaviour here.
VSAN maintains a bitmap of changed blocks in the event of components of an object being unable to synchronization due to a failure of a host, network or disk. This allows updates to VSAN objects composed of two or more components to be reconciled after a failure.
For example, in a distributed RAID-1 (mirrored) configuration, if a write is sent to nodes A and B for object X, but only A records the write before a cluster-wide power failure, on recovery, A and B will compare their logs for X and A will deliver its copy of the write to B.
vSphere HA Interoperability
vSphere HA is fully supported on VSAN cluster to provide additional availability to virtual machines deployed in the cluster. However, a number of significant changes have been made to vSphere HA to ensure correct interoperability with VSAN. Notably, vSphere HA agents communicate over the VSAN network when the hosts participates in a VSAN cluster. The reasoning behind this is that VMware wishes for HA & VSAN nodes to be part of the same partition in the event of a network failure; this avoid conflicts when there is different partitions between HA & VSAN, with different partitions laying claim to the same object. vSphere HA continues to use the management network’s gateway for isolation detection however.
Another noticeable difference with vSphere HA on VSAN is that the VSAN datastore cannot be used for datastore heartbeats. These heartbeats play a significant role in determining virtual machine ownership in the event of a vSphere HA cluster partition event. This means that if partitioning occurs, vSphere HA cannot use datastore heartbeats to determine if another partition can power on the virtual machines before this partition powers it off. This feature is very advantageous to vSphere HA when deployed on shared storage, as it allows some level of coordination between partitions. This feature is not available to VSAN deployments since there is no shared storage. If a VSAN cluster partitions, there is no way for hosts in one partition to access the local storage of hosts on the other side of the partition; thus no use for vSphere HA heartbeat datastores.
Note however that if VSAN hosts also have access to shared storage, either VMFS or NFS, then these datastores may still be used for vSphere HA heartbeats.
vSphere HA needs to store the protection metadata for each virtual machine in the cluster. On traditional datastores, this was stored in a folder on the root directory of each datastore and labeled ‘.vSphere-HA’. In VSAN, this is done differently. Instead of storing it in the root directory of a datastore, the vSphere HA protection metadata is now stored in the virtual machine’s namespace directory, along with the virtual machines configuration files. You can learn more about VM object layout on VSAN by reading this article on objects and components.
Check out all my VSAN posts here.