Heads Up! UNMAP considerations when reclaiming more than 2TB
Thanks to our friends over at EMC (shout out to Itzik), we’ve recently been made aware of a limitation on our UNMAP mechanism in ESXi 5.0 & 5.1. It would appear that if you attempt to reclaim more than 2TB of dead space in a single operation, the UNMAP primitive is not handling this very well. The current thought is that this is because we have a 2TB (- 512 byte) file size limit on VMFS-5. When the space to reclaim is above this size, we cannot create the very large temporary balloon file (part of the UNMAP process), and it spews the following errors:
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Could not truncate file .vmfsBalloonrYq2EH to 45.4 TB (File too large).
Eventually the UNMAP process will reduce the amount of space that we try to reclaim, and will get down to the 2TB threshold and start working. However this can take some time, especially if you are trying to reclaim a large amount of dead space. Therefore caution must be used when running the vmkfstools -y “%” command. The % value supplied as an argument to this command represents the % of free space to reclaim as dead space. We advise you to work out how much free space is on the volume, and then ensure that the % value is set so as not to represent dead space with a value greater than 2TB.
For example, if the free space was shown to be 30TB and you thought that there was 3TB worth of dead space to reclaim on the volume, setting the percentage value to 10% in vmkfstools would result in a 3TB balloon file. This exceeds the 2TB limit and may cause the issue described above. Therefore it would be best to run the command twice, each time with a 5% setting. This way only 1.5TB of space is reclaimed at any one time, and the temporary balloon file is limited to 1.5TB in size.
Same story on a 3PAR 7200 as on every VAAI capable storage. It looks like the baloon file is created on the same block over and over again; So on an 8 TB VMFS LUN I must be lucky to zero out the unused and dirty blocks where the 25% baloon file is created. I guess to solve it in multiple steps, the baloon files must exist until the last ballon file is created.
Yes – its not array specific. It is the behaviour of the UNMAP algorithm. Internally we reproduced it on DELL Compellent, and it was initially reported on EMC Xtremio. All I can say is that the algorithms are being improved in the next release, but I can’t say when that will be at this time.
do you know if the .vmfsBalloon calls dd if=/dev/zero? A lot of array’s (XIV/3PAR/HDS) can reclaim zero’s naturally (some even inline) and this allows the command to work even on arrays with older firmware that do not support UNMAP or arrays that have performance problems running it.
In a recent 5.1 patch set (http://kb.vmware.com/kb/2053408) there is a fix for the 2TB limit, vmkfstools now creates multiple 2TB files and then frees them all off.