I’m currently neck-deep preparing for the next version of Virtual SAN to launch. As I prepare for all the new features that are coming (that I hope to be able to start sharing with you shortly), I’m still surprised by the misconceptions that still exist with regards to basic Virtual SAN functionality. The purpose of this post is to clear up a few of those.
NumberOfFailuresToTolerate = 0 if not specified
A number of folks have mentioned that if you do not specify NumberOfFailuresToTolerate (FTT for short) explicitly in the VM Storage Policy, then it is automatically set to 0. In other words, there is no protection for the virtual machine and only a single replica is deployed. This is incorrect. Even if FTT is not included in the policy, a FTT value of 1 is used, so there will always be a mirror copy of the VM’s data. The only way to deploy virtual machines that have no protection (i.e. FTT = 0) is to explicitly state this in the policy. In all other cases, even if the default policy comes into play when you have not chosen a policy during deployment, the VM will get an FTT=1.
You select thick or thin for the VMDK during deployment
Virtual SAN ALWAYS deploys virtual machines as thin, so long as, (and this is important), there is a policy chosen for the VM. When this rule is followed, disk provisioning is as per the VM Storage Policy and the value of Object Space Reservation (OSR for short) is used for allocating a certain percentage of space for the VM (i.e. defining the thickness of the VM). By default OSR has a value of 0, so VMs will be thinly provisioned by default. The caveat to this is when no policy is chosen and the VM uses the default policy. Then the VM deployment wizards offers administrators the ability to select a thin or thick format. I already wrote about this here. So long as you always select a policy when deploying VMs on VSAN, they will always be thin (unless Object Space Reservation is specified).
Maintenance Mode behaviours
There are 3 maintenance modes associated with a host that is participating in a VSAN cluster:
- No data migration
- Ensure accessibility
- Full data migration
Let’s start with “No data migration”. This means that the host will enter maintenance mode without evacuating any data from the host. If all virtual machines have been deployed with FTT=1, then this is safe to use as there will be a full copy of the data elsewhere in the cluster, and quorum will also be available with > 50% of the components still accessible on the other hosts. However you will need to vMotion the VMs (or use DRS) from the host. Where this is a concern is if there are VMs with FTT=0 running on this host. These VMs will become inaccessible (but we do not recommend running VMs with FTT=0 generally). There is of course another consideration. If there is already a failure in the cluster which is currently being remediated in the cluster, then “no data migration” is not safe option to choose. This is because the host that you are placing into maintenance mode may be the one with the “good” copy of the data. It could also mean that placing the host into maintenance mode may reduce the quorum of the object to less than 50%. This is an important consideration.
Now lets look at the “Ensure accessibility” option. This simply ensure that there are enough components (one good, full copy of the data) and greater than 50% of the components available. Again, the concern is with VMs that have FTT=0. If VSAN detects any VMs with this policy on the host that is being placed into mmode, then the component(s) will be rebuilt elsewhere in the cluster to make sure that it is still accessible. For VMs with FTT=1 or greater, the likelihood is that no rebuilding will be necessary. Again, you will need to vMotion the VMs (or use DRS) from the host. When there is no failure in the cluster, there is no overhead when this option is chosen for VMs that have FTT=1. “Ensure accessibility” is preferred over “No data migration” for this reason.
Finally, “Full data migration” should be pretty self-explanatory. This means that all components on this host entering mmode are rebuilt elsewhere in the cluster. When this completes, all VMs continue to be compliant with their VM storage policy setting, meaning that if there is another failure in the cluster, VMs can continue to be available. The caveat here is that there is enough hosts, capacity and flash in the cluster to accommodate the rebuild task. If there is not enough resources, then you will not be able to use this option when placing a host into maintenance mode, something people tend to forget.
I hope you find some of these explanations useful – they seem to come up time and again.
As mentioned in the introduction, I’m hoping to be able to share a bunch of cool new VSAN features and functionality with you in the not too distant future – watch this space.