Some time back I wrote about proactive rebalancing, a new feature of VSAN 6.0. However I have had a number of queries recently about its functionality. The most common query is that when the proactive rebalance operation is started, there doesn’t appear to be any rebuild/resync activity, even though the command output lists a number of disks that need to be rebalanced (rebalancing moves components between physical disks so that each disk is equally consumed).
Disks will only be selected for rebalancing if the variance threshold is met. If no disks are within the variance threshold, the command output states that there are no disk that need rebalancing and thus no rebalancing takes place. Variance threshold in itself is a little difficult to understand. This is a snippet from the RVC Command Reference Guide. VSAN checks the used capacity of the physical disks using this formula:
used_capacity_of_this_disk / this_disk_capacity
and compares it to the used capacity of the least used disk:
used_capacity_of_least_full_disk / least_full_disk_capacity
In other words, a disk is qualified for proactive rebalancing only if its fullness exceeds the fullness of the “least-full” disk in the VSAN cluster by more than the variable threshold. The rebalancing process also needs to wait until the time threshold is met. In other words, the variance threshold must be met for this amount of time. This seems to be what catches many folks out, as they expect the rebalancing to start immediately. The time threshold value, by default, is 30 minutes. This may be changed by using the -i option to the proactive rebalance start command.
--time-threshold, -i : Threshold in seconds, that only when variance threshold continuously exceeds this threshold, corresponding disk will be involved to proactive rebalance, only be valid when option 'start' is specifie
Of course, there are other obvious explanation for reasons for rebalancing not starting, such as there are not enough components to move around that would make a difference, or that the components are so large, that moving them simply moves the unbalance to another node in the cluster. These are other reasons for rebalancing not starting.