VAAI UNMAP Performance Considerations
I was involved in some conversations recently on how the VAAI UNMAP command behaved, and what were the characteristics which affected its performance. For those of you who do not know, UNMAP is our mechanism for reclaiming dead or stranded space from thinly provisioned VMFS volumes. Prior to this capability, the ESXi host had no way of informing the storage array that the space that was being previously consumed by a particular VM or file is no longer in use. This meant that the array thought that more space was being consumed than was actually the case. UNMAP, part of the vSphere APIs for Array Integration, enables administrators to overcome tho challenge by telling the array that these blocks on a thin provisioned volume are no longer in use and that they can be reclaimed.
This conversation began with a brief chat where one of our partners noticed that an increased block count per iteration to the reclaim command makes a huge difference in time to reclaim. VMFS allocates blocks in multiples of 1MB and this is also the block count parameter assigned to the reclaim command. The number of blocks that is given as input is actually the number of VMFS blocks, and on a VMFS5 volume the block size is 1MB. The command in question is esxcli storage vmfs unmap with the -n option to specify the number of blocks.
We had a chat with some of our UNMAP experts. They agreed that the block count does indeed play significant role in performance.
The way UNMAP works is by allocating the said number of VMFS blocks to a temporary unmap file per iteration, issue an UNMAP on all those blocks in that iteration, and move on to the next set of blocks, until the entire volume is covered. So the more blocks you specify, the more blocks (and hence more physical space) that can be covered per iteration and therefore less work for UNMAP. Hence if the block count to reclaim is small, more UNMAP commands are required to release the space.
Let’s cover that in some more detail. Each UNMAP command can specify a maximum of 100 segment descriptors, each of which contains the starting LBA and the number of contiguous 512 byte blocks starting from that LBA. For a given number of VMFS blocks per iteration (consider 200 for example), the best case is that all those VMFS blocks are contiguous on disk, resulting in a single UNMAP command with a single segment descriptor that covers the entire 200MB of contiguous space.
In the worst case scenario, and this is going into some specifics of VMFS, out of a “cluster” of file blocks, every alternate block is allocated. This is the worst form of fragmentation, so for each VMFS cluster it results in a single UNMAP command with 100 segment descriptors, each of which covers an area of alternate 1MB blocks. The number of UNMAP commands that would be generated in this case depends on the number of VMFS blocks given as input. So for example, if you specified 1000 blocks, and they are all fragmented, it would cover at least 10 VMFS clusters and at most 1000 VMFS clusters, corresponding to 10 UNMAP commands with 100 segment descriptors each, or 1000 UNMAP commands with a single segment descriptor each. However, this is very worse case scenario and highly unlikely. The VMFS resource manager always tries to allocate a contiguous range of blocks to disk files (thin, thick, eagerzeroed thick) as discussed in this older but still informative blog post I wrote some time back on the vSphere blog.
If there is little to no fragmentation, and the bigger the number of blocks specified, the smaller the number of UNMAP commands we have to generate in order to cover the entire reclaimable area. If the space being reclaimed is not contiguous, then we may see additional UNMAP commands being issued, which will in all likelihood increase the time to reclaim the space.
Hopefully this might go some way to helping explain why you might observe different performance behaviour with various UNMAP commands, especially when larger block counts are used.
10 Replies to “VAAI UNMAP Performance Considerations”
Cormac, the entire UNMAP at the datastore level is a temporary hack, VMware should implement full sparse_SE functionality at the VM vdisk level like you are doing for Horizon VIEW and Linked clones, this would make the UNMAP at the datastore level completely redundant and the fact that competitive products are offering this do not help our joint customers, the problem has been there for ages and in this era, when the world is moving to All Flash Arrays, every unused block is a lost for our customers.
I think your missing some scenarios Itzik. Consider the deletion of a VM on a datastore. How does in guest unmapping help you to reclaim that dead space. Consider a vMotion operation. How do you reclaim the stranded space via in-guest unmapping when the VM has been moved to another datastore?
Those points aside, we do understand that there is a requirement for this feature. We do have a fling that helps in certain OS – https://labs.vmware.com/flings/guest-reclaim. There are also 3rd party products which do it – http://cormachogan.com/2013/05/01/raxco-introduces-perfectstorage/
However, I’ve already fed our twitter conversation back to our storage PMs. We realize that automatic UNMAP at the volume level, and guest OS level reclaims are items customers and partners desperately want.
few points, i’m fully aware of UNMAP at the datastore level, however, this is a fairly easy thing to do compared with the un-guest, we are talking about maybe 100 volumes Vs thousands of VM’s (or maybe even more), plus there is the more common case of filed that are deleted within the vm’s (happens every day all the time) Vs deleting VM’s from the datastores..
i’m already recommending RAXCO PS to our customers and in fact, even wrote a blog post about it here but some customers are not comfortable with paying for something that should / is given for free.
thanks again for passing on the feedback to your PM!
Hi Cormac, was this in relation to the (rather long) discussion on the Linkedin HP storage group?
I am seeing latency issues while UNMAP is running, and it might make sense if you are indeed seeing a highly fragmentet datastore, but I doubt that is the underlying issue.
As you know VMFS tries to awoid that, and on normal operations should not be that fragmented.
Again and again I am seeing storage system cringe once we run merely a couple of parellell VAAI operations. I feel this has become a checkbox feature that are not properly implemented for a significant number of storage systems.
Perhaps partner engineering should consider to broaden the certification criteria to trigger more of these cases?
BTW: Is the default 100 for vmkfstools under vSphere 5.1?
Haven’t looked yet, but its not related to this known behaviour by any chance? http://blogs.vmware.com/vsphere/2012/09/vaai-offloads-and-kavg-latency.html
I think it is related, I am not sure if SIOC measures the same way though.
As this is standard SCSI T10 command and no plugin,
I am not sure if the VAAI filter driver is involved at all?
This seems incorrect, have not digged further into it:
Regardless of what happens on the host/VM, I do observe higher service time on the controllers, something HP has indicated should not happen:
I wrote a little piece on it here: http://erwinvanlonden.net/2011/10/scsi-unmap-and-performance-implications/
This was a while ago but shows the implications when not all things are considered.
A while I ran SCSI UNMAP in some lab tests across a half dozen different arrays and the results where shockingly bad. The vSphere 5.0 UNMAP fiasco that demonstrated that outside of maybe HDS (Hu made some cryptic comment in blo posts about UNMAP being a challenge) no one is doing serious performance testing as part of validation of storage features.
While VMware is generally pretty good at validating things the recent issue with the H310 being a recommended VSAN controller reminded me that the hardware partners can’t be trusted to do their own performance validations and for the sake of our customers we have to do our own.
can somebody here please explain me what the differences are in behaviors between the solution used in vSphere 5.0 U3 and the later enhancements introduced with ESXi 5.1 (esxcli storage vmfs unmap)???
We are running VMware ESXi 5.0 U3 Hosts that connect to 3PAR F400 Arrays where I repeatedly have to use the “vmkstools -y” option introduced with vSphere 5.0 Update1 to reclaim dead space on the Virtual Volumes and I am still trying to understand what it exactly does to be able to asses the performance implications on our ESXi Hosts as well as on the 3PAR F400 Storage Arrays.
I took part of the Linked in discussion referenced here by Bjorn Anders,
but I am still nowhere close to fully understanding the process.
Comments are closed.