vSphere 6.5 p01 – Important patch for users of Automated UNMAP
VMware has just announced the release of vSphere 6.5 p01 (Patch ESXi-6.5.0-20170304001-standard). While there are a number of different issues addressed in the patch, there is one in particular that I wanted to bring to your attention. Automated UNMAP is a feature that we introduced in vSphere 6.5. This patch contains a fix for some odd behaviour seen with the new Automated UNMAP feature. The issue has only been observed with certain Guest OS, certain filesystems, and a certain block sizes format. KB article 2148987 for the patch describes it as follows:
Tools in guest operating system might send unmap requests that are not
aligned to the VMFS unmap granularity. Such requests are not passed to
the storage array for space reclamation. In result, you might not be able
to free space on the storage array.
It would seem that when a Windows NTFS filesystem is formatted with 4KB blocks, Automated UNMAP is not working. However if the NTFS is formatted with a larger block size, say 32KB or 64KB, then the Automated UNMAP works just fine. After investigating this internally, the issue seems to be related to the alignment of the UNMAP requests that the Guest OS is sending down. These have start offsets which are not aligned on the required 1 MB boundary, which is a requirement for Automated UNMAP to work. For VMFS to process the UNMAP, the requests have to arrive in 1MB aligned, and in 1MB multiples. Even though the NTFS partition in the Guest OS is aligned correctly, the UNMAP requests are not aligned, so we cannot do anything with them.
Our engineering team also made the observation that when some of the filesystem internal files grow to a certain size, the starting clusters which are available for allocation are not aligned on 1MB boundaries. When subsequent file truncate/trim requests come in, the corresponding UNMAP requests are not aligned properly.
While investigations continue into why NTFS is behaving this way, we have provided an interim solution in vSphere 6.5 p01. Now when a Guest OS sends an UNMAP request, and the starting block or ending block offset is unaligned to the configured UNMAP granularity, VMFS will now UNMAP as many of the 1MB blocks in the request as possible, and zero out the misaligned ones (which should only be the misaligned beginning of the UNMAP request, or the misaligned end of the UNMAP request, or both).
If testing this for yourself, you can use something like the “optimize drive” utility on Windows to send SCSI UNMAP command to reclaim storage, e.g.
defrag.exe /O [/G] E:
Note that /G is not supported on some Windows versions. On Linux, tools like fstrim or sg unmap can be used, e.g.
# sg_unmap -l 4097 -n 40960 -v /dev/sdb unmap cdb: 42 00 00 00 00 00 00 00 18 00
[Update – July 2017]. Another issue has been uncovered with the new automatic UNMAP implementation in 6.5 p01. Certain versions of Windows Guest OS running in a VM may appear unresponsive if UNMAP is used. Further details can be found in KB article 2150591. This issue is addressed in vSphere 6.5 U1.
Good to know. What fingerprints can we look for in logs to show us when the UNMAP requests are not aligned?
Don’t know about “fingerprints” but if the VMDK does not shrink, the UNMAP has failed. You can do your own test with some different NTFS partitions, that are formatted with different block sizes, and see if you can observe something in the logs.
You can watch for the unmap commands to appear in the esxtop.
Cormac, from my initial read of this and associated documentation, my questions is why wasn’t this found in testing? Aren’t 4KB blocks NTFS default since ages ago (e.g. any default Windows installation, or drive formatted with default options)? Surely this configuration would represent the largest percentage of deployed OS/filesystem/block sizes installed on vSphere?
No answer for that one Sam.
Why is unmap operations a difficult thing to do? It seems that from the various vSphere releases this continues to be a difficult problem to solve ?