This is something I noticed on my own lab after applying the most recent disk format (version 13) to my vSAN 7.0U1 environment. I already described the new Capacity Management features in vSAN 7.0U1 in a previous post. One of these features in the new capacity reserve which dramatically lowers the slack space requirements, and which are now controlled by two new parameters, Operations Reserve and Host Rebuild Reserve. Slack space is the term we used for the amount of space that needed to be set aside for operations such as the reconfiguration of objects after a policy change, and rebuilding of objects after a failure.
However, in order to be able to implement the new, lower guidance on slack space, a reconfigure of any large objects that exist on the system must first be undertaken. These are typically objects that are comprised of aggregated components when their required size (e.g. 1TB) is greater than the maximum supported size of a single component (i.e. greater than 255GB). Below is an example of the type of warning that is displayed in the health checks when such an object is discovered on the vSAN datastore after upgrading to disk format v13.
As you can see, there are 3 objects that require a reconfiguration. I’ve included the Info health check field to provide more context.
So why is this new object format necessary? Consider that we have a very large RAID-1 object, e.g. a 1TB VMDK. Since this exceeds the maximum size of a component (255GB), vSAN will need to aggregate perhaps 4 to 5 components to meet the required size of this object. Now consider that a customer wishes to change the Storage Policy associated with this object from a RAID-1 to a RAID-5 or RAID-6. This operation may be done “on the fly” without impact to the workload, but the whole of the object would need to be worked on simultaneously. Thus, this could require up to 1TB of slack space to be available, worst case scenario. For this reason, we introduced a new object format which would allow the individual components to be worked on independently of the object as a whole. The result is a lot less slack space is required to complete the operation.
In my example, vSAN is providing the storage infrastructure to VMware Cloud Foundation (VCF). The large objects are VMDKs that are being consumed by the SDDC Manager in VCF. Simply click the “Change Object Format” to initiate the conversion. The format change will be visible in the Resyncing Objects view, as shown below:
After the reconfiguration, we can now examine the new layout of one of the objects, in this case Hard Disk 2. We can see that the VMDK in question has been converted to a new layout. Now rather than a single RAID-1 with multiple, aggregated components, we can see it now has a concatenation of 2 RAID-1 mirrors, each with two components. In the case of some future policy changes or rebuild activity, vSAN rebuild algorithms can work on the individual components in this object rather than work on the object as a whole, reducing the slack space requirements dramatically.
Once the rebuild is complete, you will now be able to use the new Capacity Reserve feature to set space reservations for both operations (such as policy changes) as well as host rebuilds (in case of a complete host failure). I’ve added the various UI views here for completeness. Note that the administrator is automatically presented with guidelines related to how much capacity would be needed for operations reserve and host rebuild reserve. The begin to change colour/shade as the option is selected.
The final item to highlight is that the operations reserve and the host rebuild reserve also appear in the Capacity Overview view:
I hope this provides an appreciation as to why it is important to reconfigure any large objects after updating the disk format to v13. It will allow you to take advantage of the new, lower slack space requirements.