vSphere HA settings for VSAN Stretched Cluster

vsan_stretch_graphic_v02_300dpi_01_square140As part of the enhancements to Virtual SAN 6.1, stretched cluster support was announced. To provide availability for virtual machines in a VSAN Stretched Cluster, vSphere HA needs to be configured. This allows VMs to be restarted on the same site (with affinity rules) when there is a host failure, or restarted on the remote site when there is a complete site failure. However there are certain settings that need to be configured in a specific way that are fundamental to achieving high availability in a VSAN stretched cluster. In this post, I will call out the VMware recommended settings, but I will also explain why we are recommending that vSphere HA be configured in this way on a VSAN Stretched Cluster. By following this guidance, you can be sure that your virtual machines get restarted on the same site (maintaining read locality) when there is a component/host failure on one site. It will also ensure that the virtual machines failover and restart on the remaining site in the event of a complete site failure.

Host Monitoring

vSphere HA should most definitely be turned on. This will make your virtual machines highly available in the VSAN stretched cluster. Host monitoring should also be enabled. This will allow hosts in the vSphere HA cluster to exchange heartbeats over the network, and ensure that all nodes continue to participate in the cluster, and are healthy.

HA SettingsNote: If there is a host failure, the virtual machines that reside on this host will be restarted on the remaining hosts in the cluster. However, the recommendation is to have vSphere HA restart the virtual machines on hosts on the same data site, and not have them restarted on the other data site. The reason for this is that if there was a failover to the other site, the virtual machines would read from the local copy of the data on the remote site and thus will have to rewarm their cache on that site, so there will be a temporary performance dip. We want to avoid this by keeping VMs on their selected site as much as possible, and thus using the local copy of the data and cache on that site. VM/Host affinity groups and rules should be created to achieve this, and these rules should be “soft”. By being “soft” or “should” rules, it means that all attempts are made to restart the VMs on the local site, but if that is not possible (e.g. full site failure), then the “soft” rules can be broken and the VMs may be restarted on the other site.I will delve into this in greater detail in a future post, as it plays an important role in VSAN stretched cluster.

Host Hardware Monitoring – VM Component Protection

VSAN does not support VMCP, VM Component Protection, at this time. Therefore this should be left unchecked.

VMCPVirtual Machine Monitoring

This monitors the heartbeats of the virtual machines, and restarts the virtual machine if the heartbeats are not received over a period of time. This setting is optional, and is left up to the customers discretion. VMware supports having this feature either enabled or disabled.

Failure conditions and VM response

This is where the host isolation response is placed. Consider a situation where a network failure results in a host being isolated from the rest of the cluster. What do you wish to happen to those virtual machines that are on the isolated host? The VMware recommendation, when using vSphere HA in a VSAN stretched cluster, is to have the VMs powered off and restarted.

HA isolation response settingsWe are recommending this setting as it will take care of restarting virtual machines on the same site should a host get isolated, but it will also take care of restarting virtual machines on the other site should a complete site get isolated. The remaining failure conditions and responses, which mostly deal with APD & PDL events, can be left at the default setting, which is disabled. However additional advanced settings are needed to ensure that host isolation response works correctly, such as ignoring the default gateway as an isolation response IP address and choosing isolation response IP addresses local to each site. These are discussed shortly.

Admission Control

VMware supports an active/active VSAN stretched cluster configuration, in other words, running virtual machines at both data sites. Given this, we feel that admission control should be configured in such a way that will allow the complete workload to run on one remaining site if there is a complete site failure. With that in mind, the recommendation is to set admission control to a percentage value of 50%. This will leave 50% of the cluster’s CPU and Memory resources free, and should ensure that one data site can run all the virtual machines in the event of a complete failure of the other site.

HA admission controlNow, having made that recommendation, customers can obviously change this and consume more than 50% of the CPU & Memory resources and they will be fully supported in doing so. But keep in mind that if there is a site failure, not all virtual machines may be able to restart on the remaining site due to a lack of resources. This is a call that customers will have to make in their respective environments.

 Datastore for heartbeating

VSAN does not support heartbeat datastore functionality, so this needs to be disabled. There is no disable button for heartbeat datastores, so if there are VMFS volumes or NFS volumes presented to the hosts in the cluster, these datastores may be automatically chosen for heartbeat datastores. This could result in unpredictable behaviour in the VSAN stretched cluster, especially when it comes to failover events. Therefore customers need to ensure that heartbeat datastores is not in use. If you have datastores presented to the hosts other than the VSAN datastore, you should select the “Use datastores only from the specified list, and then not select any, as shown below:

HA Datastore for Heartbeat

If there are no other datastores, and only the VSAN datastore, then this is not a concern and can be left at the default.

Advanced Options

There are a number of advanced options that need to be added to ensure that host isolation works correctly when vSphere HA is configured on a VSAN stretched cluster. In a VSAN stretched cluster, one of the isolation addresses should reside in the site 1 data center and the other should reside in the site 2 data center. This would enable vSphere HA to validate complete network isolation in the case of a connection failure between sites. VMware recommends enabling host isolation response and specifying an isolation response addresses that is on the VSAN network rather than the default gateway on the management network. Therefore the vSphere HA advanced setting das.usedefaultisolationaddress should be set to false.

As stated, VMware recommends specifying two isolation response addresses, and each of these addresses should be site specific. In other words, select an isolation response IP address from the preferred VSAN stretch cluster site and another isolation response IP address from the secondary VSAN stretch cluster site. The vSphere HA advanced setting used for setting the first isolation response IP address is das.isolationaddress0 and it should be set to an IP address on the VSAN network which resides on the first site. The vSphere HA advanced setting used for adding a second isolation response IP address is das.isolationaddress1 and this should be an IP address on the VSAN network that resides on the second site.

Summary

Here is a summary of all the settings needed when enabling vSphere HA on top of VSAN stretched cluster.

vSphere HA Turn on
Host Monitoring Enabled
Host Hardware Monitoring – VM Component Protection: “Protect against Storage Connectivity Loss” Disabled (default)
Virtual Machine Monitoring Customer Preference – Disabled by default
Admission Control Define failover capacity by reserving a percentage of cluster resources. Set to 50% for both CPU & Memory.
Host Isolation Response Power off and restart VMs
Datastore Heartbeats “Use datastores only from the specified list”, but do not select any datastores from the list. This disables Datastore Heartbeats

Advanced Settings:

das.usedefaultisolationaddress False
das.isolationaddress0 IP address on VSAN network on site 1
das.isolationaddress1 IP address on VSAN network on site 2

The VSAN 6.1 Stretched Cluster Guide, which is now available, will cover these settings in more detail. Please refer to this guide if planning to implement a VSAN stretched cluster.

9 comments
  1. Hi Cormac,

    I have just finished reading your excellent VSAN Stretched Cluster guide and I had a few questions that hopefully you can help me with:

    1. Why does each 4KB write require 125Kb of bandwidth (nearly four times the amount of data – how does this compare to VPLEX/MetroCluser – I know MetroCluster replicates the data twice)?

    2. How do you size for the Witness site link (i.e. for smaller deployments such as 100 VMs you may not need 50Mbs – I see ROBO only needs 1.5Mbs for up to 25 VMs)?

    3. Are there plans to move to a “passive” witness?

    Because the Witness is active (rather than being a passive monitoring service like with VPLEX/MetroCluster) this results in:
    “Significant” bandwidth being required
    Split-brains that result in only the preferred site being active
    Loss of data if the witness and one copy of the data is unavailable (even though a valid copy exists)

    4. Are there plans to support FTT>1 (i.e. with local Erasure Coding synchronously replicated to the second site)?
    5. Are there plans to replicate the Read Cache (this is something MetroCluster does)?
    6. Would you still need to use DRS Host Affinity Rules to “lock” VMs to their preferred site if All-Flash is used (i.e. there would not be a read cache)?
    7. Are the following supported with a VSAN Stretched Cluster:
    Oracle RAC
    Microsoft Failover Clustering

    Additional questions:

    1. Does (VSAN) vSphere Replication use copy-on-write/redo logs or the more efficient RoW Virsto snapshot engine (i.e. is that why the RPO can now be 5 minutes)?
    2. Is the re-sync/re-build process throttled to prioritise foreground IO (I do not think I have seen anything on your site to say either way)?

    Many thanks as always
    Mark

    • Hi Mark, nice to hear from you. Answers, where possible, below:

      1. (a) The assumption that there is a 10Gbps between sites. The calculation came from eng. Let me see if I can get further info. (b) Not sure how this compares to other metrocluster implementations.
      2. There is some additional bandwidth information for ROBO due out soon.
      3. Can’t comment on futures I’m afraid
      4. For further details on erasure coding implementation, please sign up for the beta. I can’t comment publicly.
      5. Can’t comment on futures I’m afraid
      6. Good question. I think affinity rules will still be useful from a VM placement perspective.
      7. Not sure – I will need to ask.
      A1. vR uses a for of Change Block Tracking. I don’t believe the underlying format matters. So for v1 on-disk format, snapshots will be redo log, and for v2 on-disk format, any snapshots will be sanSparse.
      A2. Yes. This has been improved in 6.x. This is not customer facing however and is an internal implementation detail.

      • Follow up on (7). I checked the launch collateral, and it states the following:

        Support for SMP-FT
        Support for Oracle RAC
        Support for Microsoft Failover Clustering

        So the answer is yes.

        • Are we 100% sure, your stretched cluster document states that FT is not supported?

          I suspect that none of the above are supported as it does not implicitly state that they are.

          • You are correct Mark – I think I’m misreading this launch material. These features are certainly supported on a VSAN 6.1 standard configuration.

            Let me reach out to a few folks and get a definitive answer on whether the clustering technologies from Oracle and Microsoft are supported on stretched VSAN.

          • OK – I got it confirmed. Both Oracle RAC and Microsoft Cluster (SQL AAG and Exchange DAG) are supported in VSAN stretched cluster. However, since VSAN does not have support for SCSI reservation, we cannot support the configurations that require shared disk like FCI.

            And yes, you are correct, there is no support for SMP-FT in VSAN stretched cluster.

    • Follow up on (1). After a discussion with some folks, the 125Kbps for a 4KB write is very conservative on our part. But what it means is that miscellaneous activity like cache warming, de-staging, garbage collection, metadata traffic and other related activities should not impact the I/O rate if this bandwidth figure is used in calculation.

Comments are closed.