DRS and VM/Host Affinity Groups in VSAN Stretched Cluster

In a previous post, I talked about how vSphere HA is used extensively in VSAN Stretched Cluster. The primary purpose of vSphere HA is to restart virtual machines in the event of a failure. However to ensure that the restarted virtual machines continue to perform optimally, and to continue using a warmed cache, I mentioned that we need to use VM/Host affinity rules to achieve this. In this post I want to discuss the role of DRS and VM/Host affinity rules in more detail, and how they are used in VSAN stretched cluster.

What are VM/Host affinity rules?

These VM/Host rules are configured on a cluster object in the vCenter inventory. Essentially what the rules do are associate one or more virtual machines with one or more hosts. On power on, the VM should only be started on these hosts. On a failure, the VM is restarted on another host in the same VM/Host affinity group.

“Must” rules and “Should” rules

Next lets talk about the difference between “must” rules and “should” rules in the context of VM/Host affinity rules. If we set a “must” rule, this will always tie a VM to one or more hosts.  If all the hosts in that group fail, or if there are not enough resources available on the hosts in the group, the VM cannot be started. The “must” rule means it cannot run on a host that is not in the VM/Host affinity group. If we set a “should” rule, this rule will allow the VM be started on hosts that are not in the VM/host group, but only when there are no hosts/resources available in the VM/affinity group that the VM is associated with.

Which type of rule for VSAN stretched cluster?

The recommendation when setting VM/Host affinity rules in a VSAN stretched cluster is to use “should” rules and not to use “must” rules. The guidance is to create two VM/Host affinity groups, one group is made up of VMs and hosts from one site and the other group made up of VMs and hosts from the other site. If we use a “should” rule, should a VM need to be restarted, the first attempt is always made to start the VM on the hosts that are part of the same VM/Host affinity group. However, if there is a lack of resources, or if there is a catastrophic site failure, a “should” rule will allow the VM to be restarted on the other site, in other words, on hosts that are not part of the same VM/Host affinity. This is important behaviour when there is a complete site failure. The screenshot below shows where to set the “should” part of the VM/Host rule:

DRS-grp1 - should ruleNote that the is also a vSphere HA Rule setting when creating a VM/Host rule. This must also be set to a “should” rule, as shown below:

DRS VM:Host Group - should ruleDRS considerations in VSAN stretched cluster?

Now let’s consider DRS in VSAN stretched cluster. The first DRS consideration is in relationship to VM/Host affinity rules. DRS is needed for VM/Host affinity rules work. If DRS is not enabled, the “should” rules are ignored. So if you want to use VM/Host affinity “should” rules, you will need DRS.DRS can be setup in fully automated or partially automated mode. Of course, you will need to make sure you have a vSphere edition which supports DRS, so this is also a consideration.

VM placement with VM/Host affinity rules and DRS

Next, I want to highlight something in the workflow that may not be obvious. In order to be part of an affinity rule, the VM must be created in advance. So the workflow would be to deploy your VMs, create the hosts groups, and then add the VMs and hosts to the VM/Hosts groups.  You can now power on the VMs. There is no way at the current time to add the VMs to a VM/Host affinity group during the deployment. This is something we are working on to improve.

This then leads to the predicament of whether or not the VM is deployed to the correct host. Not too worry, DRS will take care of that. If it is in full automated mode, the VM will be vMotion’ed to the correct site when you attempt to power it on. If DRS is enabled in partially automated mode, you will not be able to power on the VM if it is located on the site to which it does not have infinity. You will need to manually migrate the VM to the correct site before it powers on.

If DRS is not enabled, the “should” rule is ignored and the VM can be run on any host on any site.

DRS and full site failure in VSAN stretched cluster

One final consideration is what to do when there is a full site failure, and all VMs have been restarted on the remaining site. Now the failed site recovers, and all the hosts are now rebooted an online. However the components are rebuilding/resyncing. At this point, we may want to consider waiting before bringing any virtual machines back online until the resynchronization is complete. The reason for this is read locality. If the VMs were restarted on the recovering site, the components on the recovering site are currently not available until the resync/rebuild completes. Therefore the VMs have to do I/O over the inter-site link. This will impact the performance of the VMs (once the components are synced, the VMs will stop doing I/O to the remote site and use the local copy). However it is for this reason, on a full site failure, one should consider waiting for the components to fully resync before bringing the VMs back on the recovered site. If you are using DRS in fully automated mode, you should consider placing it in partially automated mode when a full site failure has occurred to avoid VMs moving back while resync is still in progress. When the failure is resolved, and all components are reynched, place it back into fully automated mode.

4 comments
  1. Hi Cormac, thanks for the great article and making clear that DRS is basically “needed” when one wants to properly run a VSAN stretched cluster.
    One thing – “you will need to manually vMotion the VM to the correct site before it powers on” – its actually not a vMotion (migration) but a cold migration, as the VM is powered off. Customers always confuse this, so it might be good to change the text (sorry, that’s the Instructor in me talking ;))

  2. How does this work with fault domains? I’m looking at setting up a VSAN, and setting the fault domain for site A to be site B. As I understand, by doing that, when I set FTT=1, the data will be replicated to site B instead of to another node at site A. This is to cover the case where we lose the entire rack at Site A. The VMs will be able to reboot at Site B off of the replicated data at Site B.

    If I were to use VM/Host affinity groups, then wouldn’t I need to replicate to a second node at site A? Would that mean setting FTT=2, and it would replicate to a node at site A, and a node at site B? Maybe VM/Host affinity groups don’t work when using fault domains. Can you help me sort that out?

Comments are closed.