VAAI-NAS, vCloud Director and Cloning Offload Strangeness
I was in a conversation with one of my pals over at Tintri last week (Fintan), and he observed some strange behaviour when provisioning VMs from a catalog in vCloud Director (vCD). When he disabled Fast Provisioning, he expected that provisioning further VMs from the catalog would still be offloaded via the VAAI-NAS plugin. All the ESXi hosts have the VAAI-NAS plugin from Tintri installed. However, it seems that the provisioning/cloning operation was not being offloaded to the array, and the ESXi hosts resources were being used for the operation instead. Deployments of VMs from the catalogs were taking minutes rather than seconds. What was going on?
If you are not familiar with Fast Provisioning, I wrote an article on the vSphere storage blog around this some time back. In a nutshell, Fast Provisioning allows vCD to use linked-clones for the provisioning of new vApps/VMs. Without Fast Provisioning enabled, VMs are provisioned as full clones. As the name suggests, the provisioning of linked clones is much fast than full clones.
During the investigation, it was found that if the catalog where the VM that was being provisioned from was previously used for Fast Provisioning, then it failed to offload the clone operation when Fast Provisioning was subsequently disabled.
As part of the testing, a new catalog was created and Fast Provisioning was not enabled. Now when a VM is provisioned from this catalog, the full clone operation is offloaded to the array, and the provisioning is almost instantaneous.
So what was the difference between having a catalog where Fast Provisioning was enabled, then disabled, and a catalog where fast provisioning was never enabled? It appears to be the following: when Fast Provisioning is enabled, a snapshot of the VMs in the catalog is taken but is not removed when Fast Provisioning is disabled. This seems to be the crux of the issue. By clearing down this snapshot (consolidation via vCenter seems to do it), the offloading of full clones to the array when a VM is provisioned from the catalog is once again possible
Bottom line – if you have a vCD catalog that is used for Fast Provisioning, and you decide to disable it, full clones of VMs during provisioning are not offloaded. To offload full clones, you will need to consolidate any snapshots that are left on the VM from the Fast Provisioning setup, or a new catalog (without fast provisioning enabled) would need to be created for the provisioning operation to be offloaded to the array.
Also, although this was found by the guys over at Tintri, I suspect this is not a Tintri specific issue but may be encountered with other VAAI-NAS implementation. If you are using VAAI-NAS from other storage partners, and also use vCD, I’d be interested in hearing from you if you can verify this behaviour.
CH – take a look at VMware SR 13395042510. This helps detail our experience with NetApp & vCD 5.1. I wasn’t the one on our side to work the case, but I _believe_ that VMware was going to file a bug against it. As the net result, was that the VMware needed to send the instructions to the storage via VAAI to resolve the matter.
CH – I mined my e-mail tonight. The latest I have is: “There was a PR opened, but it was vague and seemed to be against 5.5.1 and not 5.1” I can be found at @itbycrayon itbycrayon.com
Thanks Jim – I’ll check it out. Thanks.
Hi Cormac,
I am using NetApp via NFS with vCD and VAAI and would like to make some comments based on my experience:
1. VAAI-Offloaded cloning of VMs via vCenter (not vCD),with VAAI properly configured, only works for VMs without snapshots.
If a VM has a snapshot the clone operation falls back to a regular (ESX host based) clone operation.
2. From my testing a while back with vCD I found that vCD is actually making customized calls to vCenter for cloning operations, depending on which setting are set (e.g. VAAI enabled/disabled, fast provisioning on or off,…).
vCD does not just simply call standard vCenter cloning operations and let’s vCenter take it from there.
An example for this is that vCD triggered VM snapshots can be VAAI-Offloaded, but a VM snapshot directly triggered via vCenter cannot.
I am actually in an NFS setup same as the described use case (NFS, VAAI enabled on datastore and fast provisioning enabled on the org VDC), but I have never changed these settings after spending much time/research initially to find the most useful combination for me.
Therefore I cannot comment directly to your findings, but I think what was experienced is most likely a combination of my points 1. and 2.
Cheers,
JC
Thank you JC – that is very useful information. I appreciate you taking the time to send me this.
I can confirm the same behavior with NetApp and NAS plugin v21.
Is there a reason you would disable fast provisioning with Tintri storage? vCloud Director is in such case unaware of fast clone ability and might do ineffective clones across volumes (as is the case with NetApp).
I guess if you wanted to do a full clone instead of a linked clone. You’d still want to offload them to the array, but leaving the snapshot behind from FP prevents it. Looks like it is definitely a VCD issue then – thanks Tomas.
The issue is that if your catalog is on one volume and you are deploying the vApps to another all the clones will be full traditional non-accelerated (slow) clones. With VAAI FP one shadow VM would be created and then all clones would be fast VAAI hardware Linked Clones. On the other hand VAAI Linked clones cannot be SVmotioned and consolidated.
Hi Tomas,
I love that this conversation just started by itself. I was looking for it sometime ago, but never got to it.
What I would like with VAAI enabled is the Shadow-VM capability of FP, but without the disk chain links – which IMHO don’t make any sense when hardware link-cloning via VAAI.
That might then fix the inability to consolidate and maybe even (I doubt it though) allow SVMotion.
Cheers,
JC
Tomas, was experience with NetApp w/7-mode or cDOT? In our lab, we are running cDOT with QoS and could not reproduce the problem – but we might not be able to send enough write IO from our VMware environment. We tried to starve the NFS clients by using the QoS features, to no avail. But, we also may have more NVRAM on the lab heads (I’ll need to verify).
AFAIK this is limitation of both
https://library.netapp.com/ecmdocs/ECMP1511537/html/man1/na_clone.1.html
“… the destination file can either be same as the source file/LUN or can be a different file/LUN within the same volume where the source file/LUN exists. “
Cormac,
Did you end up with any insights after seeing the posts & the SRs? Anyone at VMware end up getting a solution internally?
Thanks,
Jim
It is still being actively researched Jim. Although I’m hearing anecdotally that someone got it to work with vCD 5.5.2 / vSphere 5.5U2. A test is going on right now internally, but if you have an environment to try it out on with those builds, it would be helpful.