Why two stretched clusters cannot cross-host each others witness appliances!
Sorry about the wordy title but this is a question that has come up a number of times. The request is that when I have two data sites and deploy two stretched clusters across these data sites, can the other stretched cluster support the witness appliance for this stretched cluster, and vice-versa? The answer is no, and I will explain why in this post.
[Update]: I didn’t make this clear in the initial post. This is vSAN stretched cluster. While it might be applicable to other metro clusters, I didn’t consider this scenario on those deployment types. This post contains information purely about vSAN.
I also neglected to point out that for VMs to remain available on a vSAN stretched cluster, they follow the same availability rules as normal vSAN. That is, VMs are deployed with one copy of the data on site 1, a replica copy of the data on site 2 and a witness component on the witness appliance (typically a third site). To remain available, a full copy of the data AND greater that 50% of the objects must be available. This means we can lose the witness, and VMs stay available if they have both copies of the data, or if we lose a data site, the VMs remain available when they have the other copy of the data and the witness. This is called “quorum”.
This diagram might make things a little easier to understand. In this scenario, there are 2 sites with 2 stretched cluster implementations. WA is the Witness for Stretched Cluster A (orange connections) and WB is the Witness for Stretched Cluster B (green connections). Both witnesses currently run on data site 1.
Let’s now look at the failures scenarios, using diagrams.
OK, in this first one, we’ve got lucky. The site that failed was the one that did not contain our witnesses. Therefore we still have a quorum of a data copy no site 1, along with the witnesses. Therefore the VMs that were on site 1 remaining running on site 1 and vSphere HA would restart the VMs on site 2 over on site 1. But remember, we got lucky!
So what about if site 1 failed. Well, if site 1 failed, it is Game Over. We’ve now lost a copy of the data, and we’ve also lost the witness. The result is that all VMs on both stretched clusters would be unavailable/inaccessible.
Well, how about this third scenario, where we place a witness on either site. So in this case, all of the VMs on stretched cluster A fails as they can no longer reach quorum – they have lost both a data copy and they have lost access to the witness. But, this also means that the witness for stretched cluster B, which was running on site A, cannot reach quorum either. If WB becomes inaccessible, the knock on effect is that all of the VMs on stretched cluster B fail as they can no longer reach quorum – they’ve also now lost a data copy and access to their witness.
Complete meltdown!
So hopefully this gives you a good idea as to why we cannot support two stretched clusters across the same two sites cross-hosting each others’ witness appliances. [Update] However, if you have 4 sites available, and the two vSAN stretched clusters are on their own unique sites, then check out this post for another possible supported topology for witness cross-hosting.
Whats about protecting the whiteness appliance with FT, and have the primary in one site and the secondary in the other site?
But if you have a split brain, who get quorum since both sites have access to the witness?
Jap, didn’t thought long enough about the hole problem. Thank you for your fast reply!