Taking snapshots with vSAN with failures in the cluster

I was discussing the following situation with some of our field staff today. We are aware that snapshots inherit the same policies as the base VMDK, so if I deployed a VM as a RAID-6, RAID-5, or a RAID-1, snapshots inherit the same configuration. However if I have a host failure in a 6-node vSAN running RAID-6 VMs, or a failure in a 4-node vSAN running RAID-5, or a 3-node vSAN running RAID-1, and I try to take a snapshot, then vSAN does not allow me to take the snapshot as there are not enough hosts in the cluster to honour the policy. This is an example taken from a 4-node cluster with a RAID-5 VM, and I intentionally partitioned one of the nodes. I then attempted to take a snapshot. I get a failure shown similar to the following.

What can I do to address this?

Fortunately, there is a work-around. One can create a new policy which is identical to the original policy of the VM/VMDK, but with one additional policy setting added: Force Provisioning set to Yes. You can now apply this new policy to the VM/VMDK that you wish to snapshot.

The original policy would look something like this:

The new policy would look something like this:

You may now apply this policy to the VM, and propagate it to all the objects. Here is the original RAID-5, and the new policy RAID-5-RP with the Force Provision has been selected.

After clicking the “Apply to all” button, then new policy is added to all the objects:

After the policy has been changed, the VM is still using RAID-5 with one object still absent (due to the failure).

With the policy with Force Provisioning set to Yes applied, you can now go ahead and snapshot the VM/VMDK. The snapshot will now be created as a RAID-0 object, not a RAID-5.

Once the underlying issue has been resolved and the fault has been rectified, the snapshot can have the the RAID-5 policy applied to make it highly available.

