I had a few queries recently on how Virtual Volumes (VVols) worked with vSphere HA. In particular, I had a number of questions around whether or not VVol datastores could be used as a heartbeat datastore by vSphere HA. The answer is yes, the VVol datastore can be used for vSphere HA datastore heartbeating. If you want to see how, please read on.
I think these queries may have arisen due to the fact that we do not use datastore heartbeating with Virtual SAN (VSAN). Just by way of reminder, the master host in a vSphere HA cluster uses a heartbeat datastore when it can no longer communicate with a slave host over the management network. This allows the master to determine things like whether a slave host has failed and is down, or if it is in a network partition, or if it is network isolated. This then allows the master to make decisions about what to do with the VMs that reside on the slave host. If the slave host has stopped datastore heartbeating, it is considered to have failed and its virtual machines are restarted elsewhere in the cluster.
This isn’t possible in VSAN due to the fact that storage is local to a host, thus if a host is partitioned and it updates heartbeats on its local storage, there is no way for the other others in the VSAN cluster to see it. VVol datastore is shared storage, which is why datastore heartbeating works. Other hosts can see the updates on storage even if the host is off the network.
This is another nice new feature of Virtual SAN 6.0. It basically is a directive to VSAN to start re-balancing components belonging to virtual machine objects around all the hosts and all the disks in the cluster. Why might you want to do this? Well, it’s very simple. As VMs are deployed on the VSAN datastore, there are algorithms in place to place those components across the cluster in a balanced fashion. But what if a hosts was placed into maintenance mode, and you requested that the data on the host be evacuated prior to entering maintenance mode, and now you are bringing this node back into the cluster after maintenance? What about adding new disks or disk groups to an existing node in the cluster (scaling up)? What if you are introducing a new node to the cluster (scaling out)? The idea behind proactive re-balance is to allow VSAN to start consuming these newly introduced resources sooner rather than later.
One of the really nice new features of VSAN 6.0 is fault domains. Previously, there was very little control over where VSAN placed virtual machine components. In order to protect against something like a rack failure, you may have had to use a very high NumberOfFailuresToTolerate value, resulting in multiple copies of the VM data dispersed around the cluster. With VSAN 6.0, this is no longer a concern as hosts participating in the VSAN Cluster can be placed in different failure domains. This means that component placement will take place across failure domains and not just across hosts. Let’s look at this in action.
I’ve been hit up this week by a number of folks asking about “ATS Miscompare detected between test and set HB images” messages after upgrading to vSphere 5.5U2 and 6.0. The purpose of this post is to give you some background on why this might have started to happen.
In vSphere 5.5U2, we started using ATS for maintaining the heartbeat. Prior to this release, we only used ATS when the heartbeat state changed. For example, referring to the older blog, we would use ATS in the following cases:
Acquire a heartbeat
Clear a heartbeat
Replay a heartbeat
Reclaim a heartbeat
We did not use ATS for maintaining the ‘liveness’ of a heartbeat. This is the change that was introduced in 5.5U2 and which appears to have led to issues for certain storage arrays.
Before I begin, this isn’t really a feature of VSAN so to speak. In vSphere 6.0, you can also blink LEDs on disk drives without VSAN deployed. However, because of the scale up and scale out features in VSAN 6.0, where you can have very many disk drives and very many ESXi hosts, being able to identify a drive for replacement becomes very important.
So this is obviously a useful feature. And of course I wanted to test it out, see how it works, etc. In my 4 node cluster, I started to test this feature on some disks in each of the hosts. On 2 out of the 4 hosts, this worked fine. On the other 2, it did not. Eh? These were all identically configured hosts, all running 6.0 GA with the same controller and identical disks. In the ensuing investigation, this is what was found.
In some previous posts, I highlighted how VVols introduces the concept of “undo” format snapshots where the VM is always running on the base disk. I also mentioned that this has a direct impact on the way that we do snapshots on VMs that support VSS, the Microsoft Volume Shadow Copy Service. But before getting into the detail regarding how VVols is different, it’s worth spending some time understanding whats going on when VSS is called to quiesce applications when a traditional snapshot is taken. If you try to research this yourself, you’ll find that there is very little information describing what is going on. The best place I found this behaviour described is in the Designing Backup Solutions for VMware vSphere vStorage APIs for Data Protection 1.2:
Windows 2008 application level quiescing is performed using a hardware snapshot provider. After quiescing the virtual machine, the hardware snapshot provider creates two redo logs per disk: one for the live virtual machine writes and another for the VSS and writers in the guest to modify the disks after the snapshot operation as part of the quiescing operations. The snapshot configuration information reports this second redo log as part of the snapshot. This redo log represents the quiesced state of all the applications in the guest. This redo log must be opened for backup with VDDK 1.2.