Migrations and Virtual Volumes – Deep Dive
Recently I published an article on Virtual Volumes (VVols) where I touched on a comparison between how migrations typically worked with VAAI and how they now work with VVols. In the meantime, I managed to have some really interesting discussions with some of our VVol leads, and I thought it worth sharing here as I haven’t seen this level of detail anywhere else. This is rather a long discussion, as there are a lot of different permutations of migrations that can take place. There are also different states that the virtual machine could be in. We’re solely focused on VVols here, so although different scenarios are offered up, I highlight what scenario we are actually considering.
Caution – this is a very long blog post, and while it could have been separated out into multiple different posts, I eventually decided it best to keep it all in one place. Grab a coffee – you have been warned!
Virtual machine states
Let’s begin by looking at the different states that a virtual machine can have when it comes to migration. There are 4 cases to consider altogether. These are:
- powered on Storage vMotion without snapshots
- powered on Storage vMotion with snapshots
- powered off (cold) migration without snapshots
- powered off (cold) migration with snapshots
As you will see, there are some different behaviors to take into account when the VM has snapshots, and when the VM does not have snapshots.
Migration options
There are also a number of different migration scenarios which could be considered:
- Migrating a VVol based VM between storage containers on the same array (VVol -> VVol)
- Migrating a VVol based VM between storage containers on different arrays (VVol -> VVol)
- Migrating a VVol based VM between a storage container and a traditional LUN/volume on the same array (VVol -> VMFS/NFS/RDM)
- Migrating a VVol based VM between a storage container and a traditional LUN/volume on a different array (VVol -> VMFS/NFS/RDM)
- Migrating a traditional VM (VM as a set of files) between traditional LUNs/volumes on the same traditional storage array (VMFS/NFS/RDM -> VMFS/NFS/RDM )
- Migrating a traditional VM between traditional LUNs/volumes on different traditional storage arrays (VMFS/NFS/RDM -> VMFS/NFS/RDM)
In this post, I’m only going to be looking one particular use case, the first one; that is “Migrating a VVol based VM between storage containers on the same array”.
This can occur in two ways. The first is when a customer surfaces up multiple storage containers, each behind a different VVol datastore. Each storage container may have a different class of storage, for example one pool of storage may offer deduplication while the other pool of storage does replication or flash (as an example). In this way, a customer could use the vSphere client to migrate VMs just like they do it with moving VMs between traditional datastores. Here is what this might look like, where there are multiple classes (A,B, and C) which are mapped to their own storage container which is in turn mapped to a VVol datastores in vSphere:
It can also happen when a customer changes a VM Storage Policy, and the current storage container no longer satisfies the requirements in the policy. In the event that a VM wishes to consume a new data service or new class of storage that isn’t available on the source storage container/VVol datastore, you can migrate to another that does offer what you are looking for.
There is of course another way for a VM to get new capabilities without the need for a Storage vMotion operation. Let’s say that a customer surfaces up a single VVol datastore, behind which there is a single storage container, but this container has multiple storage classes/capabilities associated with it on the array. The customer deploys a VVol with a particular VM Storage Policy. This places the VVol on the storage container to meet the requirements. The customer then changes the VM Storage Policy associated with a VVol, and the current storage class no longer satisfies the requirements of the VVol. However the storage container has other classes which can satisfy the policy requirement. There may be no need to migrate the VVol as the same storage container can satisfy the requirement through a different class of storage, and the array may be capable of moving the VM automatically to the appropriate pool. Here is what this might look like, where there are multiple pools of storage (A,B, and C) with different capabilities (dedupe, replication, flash) mapped to a single storage container which is in turn mapped to a single VVol datastore in vSphere:
To recap, a storage container is mapped 1:1 to a VVol datastore in vCenter and on whatever ESXi hosts the vSphere admin decides to present the storage container, and since storage containers can present a range of capabilities, changing the storage policy associated with a VM may or may not necessitate a move to a different datastore (storage container at the backend). If the same storage container can satisfy the new policy, the array can quietly make whatever adjustments are needed in the background. On the other hand it’s possible the newly assigned policy may not be satisfiable in the current storage container and a storage migration to a compatible datastore may be needed
Offload mechanisms
And when we consider VAAI and VVols, along with block storage and NAS storage, there are in essence 3 “classes” of offloads. These are:
- Can we do a full hardware offload (VVol migration via the new VASA Primitives)?
- If that doesn’t work, can we do a host orchestrated hardware offload via the datamover on the hosts (using VAAI – vSphere API for Array Integration – primitives like XCOPY, etc)?
- And when we migrate a VVol, can we achieve any “space efficiency optimization” (ability to determine blocks in use via VASA bitmap APIs)?
Just to elaborate on “Host orchestrated hardware offload”, if the storage array does not support the full VASA primitive for VVols, but does support the VAAI XCOPY primitive for example, the ESXi host would instruct it to use that offload mechanism for the data transfer. If XCOPY wasn’t supported, or not supported between the two storage containers, there is no offloading so what we effectively end up performing is a full copy of the VM, followed by a deletion of the original VM. And of course to use XCOPY, the array needs to support the primitive for it to work.
It should also be noted that the datamover supports VAAI offload using XCOPY for migrations from VMFS *TO* VVols but not *FROM* VVols to VMFS. This may change in future, but this is the behaviour is vSphere 6.0.
The last point, space efficiency optimization, is not really an offload but it is an important optimization. The VAAI-NAS primitive “Full File Clone” can only be used on powered off virtual machines. When performing a Storage vMotion of a powered on VM on file based storage, the ESXi host cannot utilize any of the VAAI primitives and needs to fall back to the data mover for this operation. With this method the complete VMDK needs to be processed, scanning it for used blocks. The used blocks were then migrated – not very efficient. Now, the new “space efficiency optimization” provides the ability to determine to track used/changed blocks via VASA bitmap APIs.
These are steps that a migration operation goes through with VVols:
- Attempt VASA APIs offloads first
- If that fails, (perhaps the array does not support VASA APIs), the operation falls back to host orchestrated clone, and we attempt to use hardware (VAAI) data mover
- If hardware offloads (VAAI primitives) are available, these are used
- If hardware offloads (VAAI primitives) are unavailable (perhaps the array does not support VAAI or VAAI-NAS plugin isn’t available), we fall back to using the software data mover/host based copy process
Note that VVol -> VVol migrations using the cloneVirtualVolume VASA API will be limited to VVol datastores managed by the same VASA Provider (VP). There is no offloading a clone or migration operation between two storage containers managed by different VPs (even if they’re the same vendor, model, etc.). If migration using the cloneVirtualVolume VASA API is not possible, the fallback is to use the datamover with VAAI acceleration or complete (albeit space efficient) host-based copy as described above.
Let’s take each of the different cases, and then see which class is relevant.
1. Powered On Storage vMotion without snapshots
For a powered on VM without snapshots, the Storage vMotion driver coordinates the migration. The driver will use the data mover to move sections of the current running virtual machine. The data mover will employ “host orchestrated hardware offloads” when it can. Lets take each of the different storage types in turn.
If the VM is a block VVol, the following operations occur:
- VASA APIs will be used to determine a bitmap of only the relevant blocks to migrate (Space efficiency optimization)
- The VAAI primitive XCOPY, if supported (which it should be) will be used to migrate the VM (what we are referring to in this post as “host orchestrated offload”)
If the VM is a NAS VVol, the following operations occur:
- VASA APIs will be used to determine a bitmap of only the relevant blocks to migrate (Space efficiency optimization)
- The software datamover will be used to migrate the running VM. This is the same as VAAI-NAS where a running VM on NFS cannot be offloaded. What is interesting here is that there is no VAAI primitive for moving the current running point of the VM on NFS (never has been) nor can VASA APIs cannot be used to move the running VM itself.
2. Powered On Storage vMotion with snapshots
For a powered on VM with snapshots, the migration of the snapshots is done first, then the Storage vMotion driver will use the data mover to move the running VM.
If the VM is a block VVol, the following operations occur:
- VASA APIs will be used to determine a bitmap of only the relevant blocks to migrate (Space efficiency optimization)
- Additional VASA APIs, cloneVirtualVolume and copyDiffsToVirtualVolume, will be used to migrate all snapshots (Full hardware offload)
- The VAAI primitive XCOPY will be used to migrate the running VM (host orchestrated offload). What is interesting is that the VASA APIs cannot be used to move the running VM, only the snapshots
If the VM is a VVol on NAS Storage, the following operations occur:
- VASA APIs will be used to determine a bitmap for only the relevant blocks to migrate (Space efficiency optimization)
- Additional VASA APIs, cloneVirtualVolume and copyDiffsToVirtualVolume, will be used to migrate all snapshots (Full hardware offload)
- The software datamover will be used to migrate the running point of the VM. Once again note that there is no VAAI-NAS primitive for moving the current running point of the VM (never has been) nor can VASA APIs cannot be used to move the running VM itself.
3. Powered off cold migration without snapshots
For a powered off VM, the Storage vMotion driver is not in the picture. So, effectively a cold migration of a powered off VM is a logical move (clone the VM and then delete the source).
If the VM is a block VVol, the following operations occur:
- VASA APIs will be used to determine a bitmap for only the relevant blocks to migrate (Space efficiency optimization)
- The cloneVirtualVolume VASA API will be used to migrate the current running point of the VM (Full hardware offload).
If the VM is a NAS VVol, then it behaves the same as a block VVol and the following operations occur:
- VASA APIs will be used to determine only a bitmap for the relevant blocks to migrate (Space efficiency optimization)
- The cloneVirtualVolume VASA API will be used to migrate the current running point of the VM (Full hardware offload).
4. Powered off cold migration with snapshots
This is pretty much the same general idea as previously mentioned, but now we look at migrating VMs that also have snapshots.
If the VM is a block VVol, the following operations occur:
- VASA APIs will be used to determine a bitmap of only the relevant blocks to migrate (Space efficiency optimization)
- The cloneVirtualVolume VASA API will be used to migrate the current running point of the VM + snapshots (Full hardware offload)
If the VM is a NAS VVol, then it behaves the same as a block VVol and the following operations occur:
- VASA APIs will be used to determine a bitmap of only the relevant blocks to migrate (Space efficiency optimization)
- Additional VASA APIs, cloneVirtualVolume and copyDiffsToVirtualVolume, will be used to migrate all snapshots (Full hardware offload)
- The cloneVirtualVolume VASA API will be used to migrate the current running point of the VM + snapshots (Full hardware offload)
Conclusion
As you can see, depending on the state of the VM (powered on, powered off, with snapshots without snapshots) and depending on whether the arrays supports VVols, and finally depending on whether the array is block or NAS, a number of different things may happen. There are optimizations for some situations, and not for others. Note that the system automatically uses the most efficient mechanism(s) available in all combinations.
Very interesting, so the cloneVirtualVolume VASA API will tell the array to copy a VVol in one command ? What other VVol VASA API operations are there ?
Also, in case 2. “Powered On Storage vMotion with snapshots block VVol” , doens’t VAAI copy of the running VM VVol need to be done BEFORE the diffs are copyed with the copyDiffsToVirtualVolume VASA API ? If the snapshot VVols are real array based snapshots, they have pointers to the source volume for the unchanged blocks.
There are many – my understanding is that there will be a VVol programming guide coming out to coincide with GA, and this will have all the command details.
Regarding your second question, this is how the functionality was described to me. There must be a mechanism to keep the references to the snapshots, even when they are array based.
Hi Cormac,
I have to say this is all beginning to worry me.
I have been aware of VVOLs for the last 3-4 years so it has clearly taken VMware and its partners a huge amount of time to bring the technology to market – I assume because it is a radical new way to manage storage.
My expectation was that we would get granular VM based policy management much like VSAN with:
1. Soft attributes – for example de-dupe, compression, QoS, auto-tier tier %, snapshot schedule and remote replication schedule
2. Hard attributes – for example disk type if single tier and RAID level
The soft attributes should be able to be changed on the fly without the need to move the VVOL, the hard attributes would require a move to change them as you suggest above.
Overtime the hard attributes should disappear as the arrays all become more virtualised and software defined (i.e. through the use of erasure coding).
Instead what we end up with is a slightly more sophisticated version of the Profile-Driven Storage introduced in vSphere 5.
The bottom line is you should not need to move a VM to change the attributes of it (even if this is automated by the array) – surely that cannot have been the design goal.
Is this all about favouring VSAN over VVOLs so that the implementation of VVOLs has been held back to ensure that VSAN has the edge?
As mentioned previously it is also strange that VAAI is not included with the VVOLs license, when you add this to the bizarre decisions that have been made with regard to EVO:RAIL (please see http://blog.snsltd.co.uk/vmware-evorail-or-vsan-which-makes-the-most-sense/) you begin to wonder what is going on at VMware.
VMware (or maybe they are being over influenced by their parent EMC) just seem to be making some poor decisions of late – it all seems to smack of the vRAM tax all over again.
Just my opinion and I would be keen to understand yours.
Many thanks
Mark
Hi Mark,
Thanks for leaving the comment – I wouldn’t worry so much.
My understanding is that this level of policy management is coming – the issue is that I don’t have access to many newer storage arrays to see how our storage array partners have implemented VVols.
I think we’ll have to wait and see post vSphere 6.0 GA. I’m hoping we will see some implementation guides and use cases for their particular implementations. Let’s not get ahead of ourselves.