A deeper-dive into Fault Tolerant VMs running on vSAN
After receiving a number of queries about vSphere Fault Tolerance on vSAN over the past couple of weeks, I decided to take a closer look at how Fault Tolerant VMs behave with different vSAN policies. I wanted to take a look at two different policies. The first is when the “failures to tolerate” (commonly referred to as FTT) is set to 0, and the other is when the “failures to tolerate” is set to 1. The question is whether or not we could deploy VMs without any vSAN protection and allow Fault Tolerant VMs to protect them instead.
I know this has some overlapping terminology, but going forward, whenever I mention FT, I will mean vSphere Fault Tolerant VMs, and whenever I mention FTT, I will be referring to the vSAN policy setting called Number of Failures to Tolerate.
Let’s first have a look at the FTT=1. Now one major change in this newest version of Fault Tolerance is that it instantiates not only a second copy of the VM’s compute but also its storage. What this means is that a simple VM deployed on vSAN, which usually has 3 objects by default (a home namespace, a swap and a disk) now has 6 objects when FT is enabled. Here is a sample view of such a VM:
Of course, each of these objects are composed of a set of RAID-1 mirrored components and are protected by vSAN. Thus, in the event of a host failure, the objects remains accessible. And from an FT perspective, this worked as expected when I introduced a PSOD on the “primary” ESXi host. The FT VM did not need to restart and remained available as the copy of the FT-VM on the “secondary” ESXi host took over.
Next, I wanted to take a look at what might happen with a policy of FTT=0. As before, 6 objects are instantiated.
However, none of these objects have any vSAN protection. For this reason, I decided to take a closer look at the placement of the objects/components. The ESXi host on which the primary FT VM resides was host ‘h’ and the secondary was host ‘g’.
/vcsa-06/CH-Datacenter/computers> vsan.object_info 0 771b105b-da09-e419-1bdf-246e962c2408 DOM Object: 771b105b-da09-e419-1bdf-246e962c2408 (v6, owner: esxi-dell-h.rainpole.com, proxy owner: None, policy: spbmProfileName = FTT=0, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, SCSN = 8, hostFailuresToTolerate = 0, CSN = 13, proportionalCapacity = 0, spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, forceProvisioning = 0, spbmProfileGenerationNumber = 0, cacheReservation = 0) Component: 771b105b-ea05-801a-dcd4-246e962c2408 (state: ACTIVE (5), host: esxi-dell-g.rainpole.com, capacity: naa.500a07510f86d693, cache: naa.5001e82002675164, votes: 1, usage: 32.2 GB, proxy component: false) Extended attributes: Address space: 34359738368B (32.00 GB) Object class: vdisk Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/6f1b105b-3a7b-99df-f415-246e962f4850/FT-win7-on-vSAN.vmdk Object capabilities: NONE /vcsa-06/CH-Datacenter/computers> vsan.object_info 0 0555105b-10bc-8133-19a0-246e962f4910 DOM Object: 0555105b-10bc-8133-19a0-246e962f4910 (v6, owner: esxi-dell-g.rainpole.com, proxy owner: None, policy: spbmProfileName = FTT=0, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, SCSN = 3, hostFailuresToTolerate = 0, CSN = 5, proportionalCapacity = [0, 100], spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, forceProvisioning = 0, spbmProfileGenerationNumber = 0, cacheReservation = 0) Component: 0555105b-4601-f933-dbe7-246e962f4910 (state: ACTIVE (5), host: esxi-dell-e.rainpole.com, capacity: naa.500a07510f86d6bb, cache: naa.5001e820026415f0, votes: 1, usage: 0.4 GB, proxy component: false) Extended attributes: Address space: 273804165120B (255.00 GB) Object class: vmnamespace Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/FT-win7-on-vSAN Object capabilities: NONE /vcsa-06/CH-Datacenter/computers> vsan.object_info 0 6f1b105b-3a7b-99df-f415-246e962f4850 DOM Object: 6f1b105b-3a7b-99df-f415-246e962f4850 (v6, owner: esxi-dell-h.rainpole.com, proxy owner: None, policy: spbmProfileName = FTT=0, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, SCSN = 7, hostFailuresToTolerate = 0, CSN = 11, proportionalCapacity = [0, 100], spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, forceProvisioning = 0, spbmProfileGenerationNumber = 0, cacheReservation = 0) Component: 6f1b105b-ed96-27e0-71a9-246e962f4850 (state: ACTIVE (5), host: esxi-dell-g.rainpole.com, capacity: naa.500a07510f86d693, cache: naa.5001e82002675164, votes: 1, usage: 0.4 GB, proxy component: false) Extended attributes: Address space: 273804165120B (255.00 GB) Object class: vmnamespace Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/FT-win7-on-vSAN_1 Object capabilities: NONE /vcsa-06/CH-Datacenter/computers> vsan.object_info 0 761b105b-41d1-363f-7865-246e962f4850 DOM Object: 761b105b-41d1-363f-7865-246e962f4850 (v6, owner: esxi-dell-h.rainpole.com, proxy owner: None, policy: spbmProfileName = FTT=0, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, SCSN = 3, hostFailuresToTolerate = 0, CSN = 9, proportionalCapacity = 0, spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, forceProvisioning = 1, spbmProfileGenerationNumber = 0, cacheReservation = 0) Component: 761b105b-16cd-6940-1f70-246e962f4850 (state: ACTIVE (5), host: esxi-dell-f.rainpole.com, capacity: naa.500a07510f86d6b3, cache: naa.5001e82002664b00, votes: 1, usage: 0.0 GB, proxy component: false) Extended attributes: Address space: 0B (0.00 GB) Object class: vmswap Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/6f1b105b-3a7b-99df-f415-246e962f4850/FT-win7-on-vSAN-243ec497.vswp Object capabilities: NONE /vcsa-06/CH-Datacenter/computers> vsan.object_info 0 1f0e115b-8dcf-7799-75fd-246e962c2408 DOM Object: 1f0e115b-8dcf-7799-75fd-246e962c2408 (v6, owner: esxi-dell-g.rainpole.com, proxy owner: None, policy: spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, spbmProfileName = FTT=0, CSN = 2, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, spbmProfileGenerationNumber = 0, hostFailuresToTolerate = 0, cacheReservation = 0, proportionalCapacity = 0, forceProvisioning = 1) Component: 1f0e115b-3231-559a-1919-246e962c2408 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c, votes: 1, usage: 0.0 GB, proxy component: false) Extended attributes: Address space: 0B (0.00 GB) Object class: vmswap Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/0555105b-10bc-8133-19a0-246e962f4910/FT-win7-on-vSAN-80cd273b.vswp Object capabilities: NONE /vcsa-06/CH-Datacenter/computers> vsan.object_info 0 8429115b-4a48-7ac5-98ad-246e962c2408 DOM Object: 8429115b-4a48-7ac5-98ad-246e962c2408 (v6, owner: esxi-dell-g.rainpole.com, proxy owner: None, policy: spbmProfileId = 0f90316f-bf0e-4fac-9e31-51a21c981862, spbmProfileName = FTT=0, CSN = 3, checksumDisabled = 0, stripeWidth = 1, iopsLimit = 0, spbmProfileGenerationNumber = 0, hostFailuresToTolerate = 0, cacheReservation = 0, proportionalCapacity = 0, forceProvisioning = 0) Component: 8429115b-70cd-f4c5-4411-246e962c2408 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c, votes: 1, usage: 32.2 GB, proxy component: false) Extended attributes: Address space: 34359738368B (32.00 GB) Object class: vdisk Object path: /vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec//0555105b-10bc-8133-19a0-246e962f4910/FT-win7-on-vSAN.vmdk Object capabilities: NONE /vcsa-06/CH-Datacenter/computers>
Thus, the layout of the objects/components can be summarized as follows:
- Primary ESXi host for FT VM – host h
- ESXi host for primary FT VM disk – host g
- ESXi host for primary FT VM namespace – host g
- ESXi host for primary FT VM swap – host f
- Secondary ESXi host for FT VM – host g
- ESXi host for secondary FT VM disk – host h
- ESXi host for secondary FT VM namespace – host e
- ESXi host for secondary FT VM swap – host h
You might be able to spot an issue with this placement of FTT=0 objects from the above list. If there is a host failure on one of the hosts that has either the primary or secondary FT VM’s compute, this failure will also impact the vSAN components used by the related FT VM.
Take for example, host h failing. The FT VM would switch to using the secondary on host g. However, the secondary FT VM on host g has a dependency on vSAN components residing on host h which just failed. In this case, the FT VM could not stay running (which is what I observed in my testing).
If we could organize the placement of vSAN objects/components so that there was no inter-dependency between objects/components and FT primary and secondary VM placement, then the FT VM “might” be able to work with FTT=0 and survive a failure. However, we are very limited in what we can do around vSAN placement at the moment. And if a disk balance operation took place and components/objects were moved to new hosts, it is quite possible that a placement which could have worked previously, no longer works now.
It is for these reasons that we recommend protecting Fault Tolerant VMs at the storage level with a minimum FTT=1 if you wish to run these VMs on vSAN. This will provide the continuous protection that you desire for these workloads.
One final note on something which comes up a lot. We do support FT VMs on vSAN stretched cluster, but we do not support the FT VM being stretched across the cluster. In other words, you can deploy an FT VM to one of the sites only. More details can be found in this post.
[Update – June 13th, 2018] I was just informed that if you want the FT VM to be re-protected after a failure, you will need to consider the following options. Option #1 is to have a spare node in the cluster. For example, if you deployed the FT VM with FTT=1 and RAID-1 protection, and suffered a host failure, then the missing components would be rebuilt. Option #2 is if you did not have an extra node in the cluster. In this case, you could re-protect the FT VM by using the ForceProvision setting in the policy.
Cormac,
You’ve identified yet another use case for data locality in VSAN.
A host data object affinity rule (eg. host must contain a complete set of related data objects at FTT=0 equivalency) and a VM host affinity rule (eg. VM must reside on a host with its host data object affinity rule.)
Yes indeed Collin. The ‘data locality’ feature is available via RPQ in 6.7. I talked about it in my 6.7 post here – https://cormachogan.com/2018/04/17/whats-in-the-vsphere-and-vsan-6-7-release/. However there are some caveats. For example, FT requires vSphere HA enabled, but with data locality, we want to make sure the VM is never started on another host, even by HA (i.e. leave HA disable) – so some things need to be examined in more detail.
However, for the moment, we are only focused on shared-nothing type applications for ‘data locality’ but I’ll talk to some folks about FT as well.