Which policy changes can trigger a rebuild on vSAN?

Some time ago, I wrote about which policy changes can trigger a rebuild of an object. This came up again recently, as it was something that Duncan and I covered in our VMworld 2017 session on top 10 vSAN considerations. In the original post (which is over 3 years old now), I highlighted items like increasing the stripe width, growing the read cache reservation (relevant only to hybrid vSAN) and changing FTT when the read cache reservation is non-zero (again only relevant to hybrid vSAN) which led to a rebuild of the object (or components within the object). The other policy change that I highlighted was increasing Object Space Reservation. Because of the queries we received, we did some further testing.

Changing the RAID Protection Mechanism

When I first wrote the article above in 2015, vSAN did not support RAID-5 or RAID-6. It only supported RAID-1 (mirroring) protection. However, if you decide to change a policy from RAID-1 to RAID-5 or RAID-6, or vice-versa, an object rebuild is required. The same consideration is true if you wish to go from RAID-5 to RAID-6, or vice-versa.

Increasing the Object Space Reservation

On testing this on the latest release of vSAN, an object rebuild only takes place when the Object Space Reservation of the object is 0, and a new policy is applied that contains an Object Space Reservation value greater than 0. At this point, we are essentially making the object thick rather than thin. Once the object is thick (i.e. it has an Object Space Reservation value greater than 0), increasing the OSR value did not initiate a new rebuild. Something may have changed in this behaviour since I last tested it, but in the latest release this is the current behaviour. Rebuilds only seem to take place now when OSR is initially 0, and a new non-zero OSR value is applied to the object. The output below is from a test I did in the lab where we can see the new components with a non-zero OSR. These are in a state of RECONFIGURING. The current ACTIVE components will be removed when the sync completes, and the RECONFIGURING components will become ACTIVE.

/localhost/CH-DC/vms> vsan.vm_object_info 10
VM centos-swarm-master:
...
    DOM Object: 5d3f845a-7387-934a-219e-246e962f4910 (v6, owner: esxi-dell-e.rainpole.com, proxy owner: None, policy: CSN = 4, spbmProfileId = 5e72fea5-8391-4677-ba5e-14c357faa109, proportionalCapacity = 20, spbmProfileGenerationNumber = 0, hostFailuresToTolerate = 1, spbmProfileName = OSR=20%)
      RAID_1
        Component: 5d3f845a-43c6-9d4b-bf8f-246e962f4910 (state: ACTIVE (5), host: esxi-dell-e.rainpole.com, capacity: naa.500a07510f86d6bb, cache: naa.5001e820026415f0,
                                                         votes: 1, usage: 6.9 GB, proxy component: false)
        Component: 5d3f845a-d161-9f4b-3984-246e962f4910 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c,
                                                         votes: 1, usage: 6.9 GB, proxy component: false)
        Component: 8141845a-50fc-2cce-f012-246e962f4910 (state: RECONFIGURING (10), host: esxi-dell-g.rainpole.com, capacity: naa.500a07510f86d693, cache: naa.5001e82002675164,
                                                         dataToSync: 3.48 GB, votes: 3, usage: 3.7 GB, proxy component: false)
        Component: 8141845a-f925-2fce-f07c-246e962f4910 (state: RECONFIGURING (10), host: esxi-dell-e.rainpole.com, capacity: naa.500a07510f86d685, cache: naa.5001e820026415f0,
                                                         dataToSync: 3.93 GB, votes: 3, usage: 3.3 GB, proxy component: false)
      Witness: 8141845a-53ac-30ce-4899-246e962f4910 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c,
                                                     votes: 3, usage: 0.0 GB, proxy component: false)
/localhost/CH-DC/vms>

Enabling/Disabling Checksum

This is another feature which was not available when I did the initial testing. If checksum is already ENABLED, there is no resync or rebuild activity when it is DISABLED. However, if checksum is DISABLED, and a new policy with checksum ENABLED is applied to a VM/VMDK, then a rebuild of the components is takes place. Again, I am not sure if this has always been the case, but this is how it behaves in the current release of vSAN.

The point of all this is to highlight that policy changes can be made on-the-fly to change your VM’s storage requirements, should your application requirements need it. However, it should be understood that changing policies on-the-fly like this requires additional capacity on the vSAN datastore, and can also impact vSAN performance since changes like this can instantiate a significant amount of rebuild/resync traffic on the vSAN network. This is especially true if you change a policy that impacts many VMs at the same time. I would urge vSAN users to consider policy changes a maintenance task, and plan accordingly.

12 Replies to “Which policy changes can trigger a rebuild on vSAN?”

  1. Could a vmdk expansion start a resync/rebuild process, or would this only happen in some specific situation, such as in the case of objects larger than 255GB?

    Or this should never happen, so if a resync started after an vmdk expansion was for another reason, like unbalanced disk uitilization?

    1. I just did a quick test, and grew a 40GB VMDK to 300GB. All that happened was that new components were added to each side of my RAID-1 mirror. The usage on the new components was 0GB and I observed no rebuild or resync activity. I agree that in a system that was heavily utilized, we might re-balance when a VMDK is grown.

  2. Thank you Cormac.
    just asked this because i faced an issue at a customer recently, and he told that everything started after a vmdk expansion. Still trying to understanding what could be happened…

  3. We experienced same kind of large resync issue after expanding VMDKs on couple of very large Database VMs. About 3 TB added to two VMs and VSAN started resync with severely throttling the performance of those two VMs. Tried to bring down the resync components from 50 to a lower number nothing changed the performance. The resync is very slow from the start, after 24 hours stil showing 11TB of data to resync, it was about 25 TB when it started the resync. vMware support is engaged and active working on it.

    1. If you figure out the cause, please update us with a comment. I’ll see if I can reproduce the issue in my lab. Tx

  4. Hi Cormac and thanks for your blog with very nice informations.

    We are in the stage of desgning our new all flash vsan and I had a question about vsan 66, PFFT and SFTT, but I couldn’t ask in the blog because comments are closed, so if you could kindly enlight me about this …

    I was wondering, about the following scenario, what would happens :
    stretched cluster (2 sites), PFFT=1 obviously and SFFT let’s say = 1.
    If we have VM1 running on ESXi1 (that is part of vsan cluster with local storage), we also have ESXi2, 3 and 4.
    If ESXi2 and 3 goes down, will vSAN stops the VM running on ESXi1 or will vSAN retrieve data from remote site and the VM will keep running, eventually vMotion’ed on an ESXi running in remote site ?

    THanks!

    1. This depends on whether the remaining hosts on the failing site can still provide access to the object. If the object is mirrored locally as well as remotely using RAID-1, then it is possible that the 2 remaining hosts can still run VM1s compute and storage. So no need to go across the inter-connect in that case for reads. Writes will always go across anyway.

      If however your local protection was RAID-5, and you lost 2 hosts, then of course the RAID-5 is completely broken. However there is a possibility that the remaining 2 hosts can still run the compute for the VMs, so in that case VM1s compute would be on site 1 but the only good storage would be on site 2. In that case, then reads (as well as the writes) would traverse the inter-site link.

      We are actually working closely with the DRS and HA team to deal with these sorts of scenarios. In other words, even when we do not have a full site failure, when does it make sense to trigger a HA event and have the VMs fail over to the other site?
      Also, when a site failure is resolved, can we detect the resync/rebuild activity and have automatic fail back when sync is complete. Right now, these are operational considerations that you need to take into account.

      1. “However there is a possibility that the remaining 2 hosts can still run the compute for the VMs, so in that case VM1s compute would be on site 1 but the only good storage would be on site 2”

        Yes, the scenario is like this, RAID5 local, RAID 1 on remote, VM running on healthy host, but local RAID 5 down because of too many hosts failure, so for you the VM will keep running on local site with access to its storage on remote site. But you state it is a “possibility”, this haven’t been tested yet ?

        We are in the buying compute phase, so I will test every kind of scenarios like this one before going production, I will see how “possible” this scenario can be handled (or not) by vSAN 🙂

        Thanks

        1. I said “possibility” because I do not know how many VMs you plan on running at either site. So some VMs may stay running on the local site, others may have to be restarted on the remote site if there is a lack of available compute resources on the local site.

          BTW – you can only have identical local/secondary protection on both sites, so either RAID-1 at both sites or RAID-5 at both sites.
          Primary/cross-site protection is always RAID-1.

          1. Hi,

            Yes that’s the plan, both sites with local “raid5” and replication “raid1”.
            I understand the compute resources need to run VMs 😉
            I was just wondering about the storage access when local site’s storage goes down (say, 2/4 ESXi down), what would happen to running VMs on 2 remaining nodes, you answered me that they will still be running, accessing their storage on the remote site.

            Sorry I think my english is not clear enouth but you have answered me 🙂

  5. Superbe! Je suis content d’avoir répondu à ton question. Et je pense que ton anglais est meilleur que mon français 🙂

Comments are closed.