Guest OS space reuse on vSAN

This post came about after a brief discussion with my pal, Lee Dilworth. Although the outcome of this test has really nothing to do with vSAN, the behaviour was observed on certain Guest OS which were running on vSAN. I guess the first thing that needs to be made clear is that there is no support for in-guest UNMAP (or TRIM) for VMs running on vSAN at this time, although it is something we are examining very closely. So with this in mind, we had feedback to say that a test being run as part of a proof-of-concept was showing that some very different results, depending on the Guest OS. With Windows-based VMs, it seems that the amount of space being used by this test was not increasing the amount of space consumed on the vSAN datastore, whereas running the same test with Linux-based VMs, it was. The test was simply copying a bunch of files to a folder or directory, deleting the files, and then copying them once more back to the same folder on directory in the Guest. I decided to try it out in my own lab to see this behaviour for myself.

I started with a Windows 2012R2 VM. This is how things looked before I started out with my experimenting:

At this point, there was 386.7GB of space consumed on the vSAN datastore. On my Windows VM, I created a VMDK on the vSAN datastore (which of course is thin provisioned by default), and then proceeded to copy around 800MB of data. After the copy completed, I refreshed the capacity view, and now I see that there is 388.37GB of space consumed on the vSAN datastore. This is an increase of around 1.6GB of space, considering the VMDK is implemented as a RAID-1.

I now go ahead and delete the folder contents. What I noticed, and what one would expect, is that the amount of space consumed does not decrease (as we still do not have in-Guest UNMAP/TRIM support). So even after deleting the space, and running the “optimizer” on the volume within the Windows 2012R2 Guest,  the space consumed remains the same:

My final test is to copy the data back to the same location, and see if the amount of space consumed increases once more, or whether the blocks that backed the previous files will get reused. After doing another data copy to the VMDK, I checked the space usage one final time.

And what I notice is that there is a slight increase, but not too much. After copying 800MB of data (1.6GB for the whole RAID-1), there is a minor increase of 20MB of space consumed (40MB for the RAID-1). So this is an indicator that the Guest is reusing blocks, and not grabbing a whole set of new blocks, and unnecessarily growing the VMDK.

Now the customer did notice an anomaly when trying the same test in a Linux Guest OS with an EXT4 filesystem. For some reason, that test did not appear to be reusing blocks. Instead it seems to be growing the VMDK with every test. This seems to be a known issue with EXT4, as highlighted here, and this patch description gives further detail. Again, it seems to be EXT4 specific, and does not impact EXT3, XFS or other Linux filesystems.

While this behaviour might not be specific to vSAN, the inability to do reclaiming of dead space from within the Guest will mean that EXT4 will continue to consume space.

And remember that this is just for in-guest scenarios. If you delete VMs, objects or files on the vSAN datastore, that space can immediately be reclaimed and reused.