Site icon CormacHogan.com

Which policy changes can trigger a rebuild on vSAN?

Some time ago, I wrote about which policy changes can trigger a rebuild of an object. This came up again recently, as it was something that Duncan and I covered in our VMworld 2017 session on top 10 vSAN considerations. In the original post (which is over 3 years old now), I highlighted items like increasing the stripe width, growing the read cache reservation (relevant only to hybrid vSAN) and changing FTT when the read cache reservation is non-zero (again only relevant to hybrid vSAN) which led to a rebuild of the object (or components within the object). The other policy change that I highlighted was increasing Object Space Reservation. Because of the queries we received, we did some further testing.

Changing the RAID Protection Mechanism

When I first wrote the article above in 2015, vSAN did not support RAID-5 or RAID-6. It only supported RAID-1 (mirroring) protection. However, if you decide to change a policy from RAID-1 to RAID-5 or RAID-6, or vice-versa, an object rebuild is required. The same consideration is true if you wish to go from RAID-5 to RAID-6, or vice-versa.

Increasing the Object Space Reservation

On testing this on the latest release of vSAN, an object rebuild only takes place when the Object Space Reservation of the object is 0, and a new policy is applied that contains an Object Space Reservation value greater than 0. At this point, we are essentially making the object thick rather than thin. Once the object is thick (i.e. it has an Object Space Reservation value greater than 0), increasing the OSR value did not initiate a new rebuild. Something may have changed in this behaviour since I last tested it, but in the latest release this is the current behaviour. Rebuilds only seem to take place now when OSR is initially 0, and a new non-zero OSR value is applied to the object. The output below is from a test I did in the lab where we can see the new components with a non-zero OSR. These are in a state of RECONFIGURING. The current ACTIVE components will be removed when the sync completes, and the RECONFIGURING components will become ACTIVE.

/localhost/CH-DC/vms> vsan.vm_object_info 10
VM centos-swarm-master:
...
    DOM Object: 5d3f845a-7387-934a-219e-246e962f4910 (v6, owner: esxi-dell-e.rainpole.com, proxy owner: None, policy: CSN = 4, spbmProfileId = 5e72fea5-8391-4677-ba5e-14c357faa109, proportionalCapacity = 20, spbmProfileGenerationNumber = 0, hostFailuresToTolerate = 1, spbmProfileName = OSR=20%)
      RAID_1
        Component: 5d3f845a-43c6-9d4b-bf8f-246e962f4910 (state: ACTIVE (5), host: esxi-dell-e.rainpole.com, capacity: naa.500a07510f86d6bb, cache: naa.5001e820026415f0,
                                                         votes: 1, usage: 6.9 GB, proxy component: false)
        Component: 5d3f845a-d161-9f4b-3984-246e962f4910 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c,
                                                         votes: 1, usage: 6.9 GB, proxy component: false)
        Component: 8141845a-50fc-2cce-f012-246e962f4910 (state: RECONFIGURING (10), host: esxi-dell-g.rainpole.com, capacity: naa.500a07510f86d693, cache: naa.5001e82002675164,
                                                         dataToSync: 3.48 GB, votes: 3, usage: 3.7 GB, proxy component: false)
        Component: 8141845a-f925-2fce-f07c-246e962f4910 (state: RECONFIGURING (10), host: esxi-dell-e.rainpole.com, capacity: naa.500a07510f86d685, cache: naa.5001e820026415f0,
                                                         dataToSync: 3.93 GB, votes: 3, usage: 3.3 GB, proxy component: false)
      Witness: 8141845a-53ac-30ce-4899-246e962f4910 (state: ACTIVE (5), host: esxi-dell-h.rainpole.com, capacity: naa.500a07510f86d6bf, cache: naa.5001e8200264426c,
                                                     votes: 3, usage: 0.0 GB, proxy component: false)
/localhost/CH-DC/vms>

Enabling/Disabling Checksum

This is another feature which was not available when I did the initial testing. If checksum is already ENABLED, there is no resync or rebuild activity when it is DISABLED. However, if checksum is DISABLED, and a new policy with checksum ENABLED is applied to a VM/VMDK, then a rebuild of the components is takes place. Again, I am not sure if this has always been the case, but this is how it behaves in the current release of vSAN.

The point of all this is to highlight that policy changes can be made on-the-fly to change your VM’s storage requirements, should your application requirements need it. However, it should be understood that changing policies on-the-fly like this requires additional capacity on the vSAN datastore, and can also impact vSAN performance since changes like this can instantiate a significant amount of rebuild/resync traffic on the vSAN network. This is especially true if you change a policy that impacts many VMs at the same time. I would urge vSAN users to consider policy changes a maintenance task, and plan accordingly.

Exit mobile version