Erasure Coding and Quorum on vSAN

I was looking at the layout of RAID-5 object configuration the other day, and while these objects were deployed on vSAN with 4 components, something caught my eye. It wasn’t the fact that there were 4 components, which is what one would expect since we implement RAID-5 as a 3+1, i.e. 3 data segments and 1 parity segment. No, what caught my eye was that one of the components had a different vote count. Now, RAID-5 and RAID-6 erasure coding configurations are not the same as RAID-1. With RAID-1, we deploy multiple copies of the data depending on how many…

Another recovery from multiple failures in a vSAN stretched cluster

In a previous post related to multiple failures in a vSAN stretched cluster, we showed that if a failure caused the data components to be out of sync, the most recent copy of the data needs to recover before the object becomes accessible again. This is true even if there are a majority of objects available (e.g. old data copy and witness). This is to ensure that we do not recover the “STALE” copy of the data which might have out of date information. To briefly revisit the previous post,  the accessibility of the object when there are multiple failures…

Understanding recovery from multiple failures in a vSAN stretched cluster

Sometime back I wrote an article that described what happens when an object deployed on a vSAN datastore has a policy of Number of Failures to Tolerate set to 1 (FTT=1), and multiple failures are introduced. For simplicity, lets label the three components that make up our object with FTT=1 as A, B and W. A and B are data components and W is the witness component. Let’s now assume that we lose access to component A. Components B & W are still available, and the object (e.g. a VMDK) is still available. The state of these two components (B…

VSAN 6.0 Part 1 – New quorum mechanism

vSphere 6.0 released yesterday. It included the new version of Virtual SAN – 6.0. I now wish to start sharing some of the new features and functionality with you. One of things we always enforced with version 5.5 was the fact that when you deployed a VM with NumberOfFailuresToTolerate = 1, you always had at least 3 components: 1st copy of the data, 2nd copy of the data, and then a witness component for quorum. In version 5.5, for a VM to remain accessible, “one full copy of the data and more than 50% of components must be available”. We…