A few weeks, my good pal Cody Hosterman over at Pure Storage was experimenting with VAAI and discovered that he could successfully UNMAP blocks (reclaim) directly from a Guest OS in vSphere 6.0. VAAI are the vSphere APIs for Array Integration. Cody wrote about his findings here. Effectively, if you have deleted files within a Guest OS, and your VM is thinly provisioned, you can tell the array through this VAAI primitive that you are no longer using these blocks. This allows the array to reclaim them for other uses. I know a lot of you have been waiting for this functionality for some time. However Cody had a bunch of questions and reached out to me to see if I could provide some answers. After conversing with a number of engineers and product managers here at VMware, here are some of the answers to the questions that Cody asked.
I was involved in some conversations recently on how the VAAI UNMAP command behaved, and what were the characteristics which affected its performance. For those of you who do not know, UNMAP is our mechanism for reclaiming dead or stranded space from thinly provisioned VMFS volumes. Prior to this capability, the ESXi host had no way of informing the storage array that the space that was being previously consumed by a particular VM or file is no longer in use. This meant that the array thought that more space was being consumed than was actually the case. UNMAP, part of the vSphere APIs for Array Integration, enables administrators to overcome tho challenge by telling the array that these blocks on a thin provisioned volume are no longer in use and that they can be reclaimed.
We just got notification about a potential issue with the VAAI UNMAP primitive when used on EMC VMAX storage systems with Enginuity version 5876.159.102 or later. It seems that during an ESXi reboot, or during a device ATTACH operation, the ESXi may report corruption. The following is an overview of the details found in EMC KB 184320. Other symptoms include vCenter operations on virtual machines fail to complete and the following errors might be found in the VMkernel logs:
WARNING: Res3: 6131: Invalid clusterNum: expected 2624, read 0 [type 1] Invalid totalResources 0 (cluster 0).[type 1] Invalid nextFreeIdx 0 (cluster 0).
WARNING: Res3: 3155: Volume aaaaaaaa-bbbbbbbb-cccc-dddddddddddd ("datastore1") might be damaged on the disk. Resource cluster metadata corruption has been detected
Continuing on the series of vSphere 5.5 Storage Enhancements, we now come to a feature that is close to many people’s hearts. The vSphere Storage API for Array Integration (VAAI) UNMAP primitive reclaims dead or stranded space on a thinly provisioned VMFS volume, something that we could not do before this primitive came into existence. However, it has a long and somewhat checkered history. Let me share the timeline with you before I get into what improvements we made in vSphere 5.5.
This is possibly the most exciting new storage feature in the vSphere 5.1 release. Space Efficient Sparse Virtual Disks (or SE Sparse Disks for short) were designed to alleviate two issues. Let’s describe these issues first of all.
Problem Statement #1 – Let’s take a Guest OS running on a linked clone (View desktop if you will), and this Guest OS issues a 4KB write. vmfsSparse disk (which is the format used by traditional linked clones) has a block allocation unit size of 512 bytes. In other words, this Guest OS is backed by 512 byte blocks. Depending on the applications deployed in the Guest OS, a worst case scenario is that these 512 byte blocks may not be contiguous on the VMDK, and thus may not be contiguous on the VMFS or NFS datastore. This could lead to multiple writes taking place on the back-end storage array for a single Guest OS write. Another side effect is that the partition created on Guest OS may also be misaligned (because of the very small allocation unit size), again causing multiple writes to take place on the array for a single Guest OS write. Finally, this 512 byte block allocation unit size may not match the block size preference of the storage array, leading to additional overhead in handling these smaller, partial writes.
Problem Statement #2 – The major space inefficiency issue of allocating as yet unused blocks in the Guest OS filesystem/database has basically been addressed by Thin Provisioning. However, another major space efficiency issues still exists – the issue of reclaiming Stale/Stranded data from within a Guest OS. While VMware has addressed this at the datastore level with the VAAI UNMAP primitive, it is still an issue from within the Guest OS. This is particularly problematic with VMware View Desktops deployed on linked clones. These desktops start off as very small in size, but over a period of time they will grow and may end up being as big as the base disk (again, worst case scenario). This then requires administrative intervention to reduce the size of the desktops.
Now that we understand the main issues, let’s see how the new SE Sparse Disk format helps to address them.
Addressing Issue #1 – By default the grain size/block allocation unit size for Virtual Machine disks on ESX is 4KB. The vmfsSparse format, used by snapshots and linked cloned have a grain size of 512 bytes or 1 sector. The vmfsSparse format get 16MB chunks at a time from VMFS, but then allocates it at 512 bytes at a time. This is the root cause of many of the performance/alignment complaints that we currently get with linked-clones/snapshots, and what we are addressing with SE Sparse Disks.
With the introduction of SE Sparse disks, the grain size/block allocation unit size is now tuneable and can be set based on the preferences of a particular storage array or application. Note however that this full tuning capability will not be exposed in vSphere 5.1.
Addressing Issue #2 – One of the major features of the new SE Sparse Disk is its ability to reclaim previously used space within the Guest OS. This stale data is data that was previously written to, but is currently in unaddressed blocks in a file system/database. Customers used to have to carry out some very manual processes to reclaim this stranded space in the past, using a combination of Guest OS tools and vSphere technologies (e.g. sdelete followed by Storage vMotion).
There are two steps involved in the space reclamation feature; the first step is the wipe operation which scans the Guest OS looking for stranded space and reorganizes the Virtual Machine Disk to frees up a contiguous area of free space.
The second step is the shrink operation which initiates either a SCSI UNMAP operation (block devices) or a RPC truncate (NFS) to delete the contiguous area of free space at the end of the VMDK, reducing its size, and then telling the storage array that it can now reclaim that area of free space.
The Wipe operation is initiated by an API call to the VMware Tools running in the Guest OS. This will allow the task to be scheduled out of hours so that there is no impact on the desktops. This initiates a scan of the filesystem looking for unused filesystem blocks.
When we know which blocks are free, we get the vSCSI layer to reorganise the SE Sparse Disk by moving blocks from the end of the SE Sparse disk to unallocated blocks at the beginning of the SE sparse disk. The SE Sparse disk metadata contains a bitmap where 1 bit represents a 4KB block and indicates if the block is allocated or unallocated.
When there is a contiguous range of free space at the end of the SE Sparse Disk, a SCSI UNMAP command is sent to reclaim those blocks, and truncate/shrink the SE sparse disk. Note that this is the same UNMAP primitive which we introduced in VAAI improvements in vSphere 5.0, so this will cause overhead on the storage arrays and could have a significant impact on performance for some storage arrays, just like dead space reclamation for VMFS-5 deployed on Thin Provisioned LUNs. This is why the recommendation is to run this reclaim feature out of hours or during a maintenance window.
During the shrink operation, allocated blocks at the end of the SE Sparse disk are moved to unallocated space at the beginning of the disk. This will leave a contiguous unallocated section at the end of the SE Sparse disk which can be truncated during the shrink operation.
Note that the Virtual Machines require HWv9 to handle the SCSI UNMAP command in the Guest OS – earlier versions will not know how to handle this command.
There is a very specific use case for SE Sparse Disks in vSphere 5.1. The scope of SE Sparse Disks in vSphere 5.1 has been restricted to a VMware View use case when VMware View Composer uses “Linked Clones” for the roll-out of desktops.
VMware View desktops will also benefit from the new 4KB grain size, as it addresses the partial write and alignment issues experienced by some storage arrays when the 512 bytes grain size found in the vmfsSparse format is used by linked clones.
SE Sparse Disks also give far better space efficiency to desktops deployed on this virtual disk format since it has the ability to reclaim stranded space from within the Guest OS.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan
Regular readers of my VMware Storage Blog will be no stranger to Nimble Storage. I’ve blogged about them on a number of occasions. I first came across them at a user group meeting in the UK & I also wrote an article about them when they certified on VMware’s Rapid Desktop Program for VDI.
Nimble Storage have been in touch with me again to share details about their new 2.0 storage architecture. After a very interesting and informative chat with Wen Yu of Nimble, I’m delighted to be able to share these new enhancements with you, in this first post on my new blog site.
Nimble Storage’s new enhancements can be categorized into two areas. The first of these is a new scale out architecture and the second is further integration with vSphere.
Scale to Fit
Scale to Fit architecture is how Nimble Storage describe their new elastic scaling feature. It basically allows customers to scale out their storage on a particular dimension, be it capacity or performance. This new architecture allows customers to start with a small footprint, and then to scale performance and capacity. This can be done without having to migrate any data and without any Virtual Machine/application downtime. The great advantage of this of course is that it avoids over-provisioning of storage up front, keeping initial costs down. When additional performance or capacity is needed, customers only need to grow on that dimension. This means that customers don’t pay for additional performance if they only need capacity, and vice-versa.
vSphere Integration Features
There are 3 new vSphere integration features to call out in this new release.
- Nimble Storage have a new Storage Replication Adapter (SRA) for integrating with VMware Site Recovery Manager (SRM). Business Continuance and Disaster Recovery are essential features for any enterprise class storage array, and it is great to see that Nimble now offer full integration with VMware’s BC/DR flagship product.
- There are a number of additional VAAI offload primitives supported. The first of these is Hardware Assisted Locking (ATS) which enables ESXi hosts to offload VMFS volume locks to the Nimble storage array. The second is the UNMAP primitive, which enables VMFS volumes built on thin provisioned disks to do space reclamation after storage vMotion or VM deletion. If I remember correctly from previous conversations with Nimble, they already support the WRITE_SAME primitive.
- This last feature is the one I am most excited about. Nimble Storage now offer their own Path Selection Plugin (PSP) into the Pluggable Storage Architecture of the VMkernel. This optimized multipathing plugin will load balance I/O, and provide linear performance scalability with a single Nimble storage array or multiple storage arrays in a scale-out cluster. The PSP is called Nimble_PSP_Directed.
Nimble Storage are a sponsor at the VMworld 2012. You’ll find them at booth 306 at the US conference this year.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan