I had this question a number of times now. Those of you familiar with VSAN will know that if a component goes absent for a period of 60 minutes (default) then VSAN will begin rebuilding a new copy of the component elsewhere in the cluster (if resources allow it). The question then is, if the missing/absent/failed component recovers and becomes visible to VSAN once again, what happens? Will we throw away the component that was just created, or will we throw away the original component that recovered?
[Updated 29-Oct-2015] When I first published this article, the VSAN engineering team reached out to say that what I described was not 100% accurate. What I described was a future plan for the resync process. They then described to me how the process actually works.
- If a component goes absent, we will wait 60 minutes (the default grace period) before creating a new mirror for that component. If the absent component comes back within the 60 min grace period, VSAN will not create any new components but it may resync the recovered component if it’s not up-to-date.
- If the component has been absent for more than 60 minutes, VSAN will “fix” it by creating a new mirror of this component. In the meanwhile, VSAN marks the old absent one to be “transient” (meaning we will get rid of it later). After the new component is fully resynced (so we have availability compliance), VSAN removes the old (absent) transient component.
- If the old absent component comes back during the resyncing period for the new component, VSAN will resync the recovered component as well and bring it back to an active state, but this old component will eventually be removed (even though it’s fully resynced) once the newly created component has been fully resynced.
So what about the proposed improved mechanism? This is how I described it in the first version of the post. After the default 60 minute timeout, a new component is created and it starts resynching. If the original component comes back, it remains in an absent state but it would start to resynch as well. VSAN internally treats this as a resynching component, similar to the new component that is already resynching. Then whichever of the two components finishes resynching first is kept whilst the other component is cleaned up.
Let’s look at an example. Here we have a 4 node VSAN cluster, with multiple VMs deployed. Let’s look at one VM in particular, which has two components (c1 and c2) and a witness:
Let’s assume that the host has been shutdown. The component now enters an absent state. If 60 minutes passes by and the component has not come back online, VSAN begins rebuilding the missing component elsewhere in the cluster (in other words build another copy of c2). In this example I have tagged it as c3:
So far, so good. Now lets say that whatever issue affected the first host has now been resolved. This means that the host rejoins the cluster and now the original component c1 is visible again. In this case, VSAN will continue to sync c1 to c2, and will continue to build c3.
Which ever of these components synchronizes first is the one that is kept. The other component is then discarded. In this example, I have assumed that c3 finishes building sooner than c1 synchronizes:
However the side effect of both components resyching is that it creates additional network traffic. Ideally VSAN would determine which one will resynchronize first and only keep that one. We are also looking at optimizing some of this behavior going forward.