Losing the VASA Provider and/or vCenter Server in VVols
With the release of vSphere 6.0 earlier this year, VMware introduced the eagerly anticipated VVols or Virtual Volumes. As we see more and more traction around VVols, a specific question has come up a number of times already. The question is basically: “What happens to VVols if I lose my VASA Provider or my vCenter Server, or indeed both of these components? Will I still have access to my devices?”.
Let’s start with the VASA provider. This is the component that sits between the array and vCenter server/ESXi hosts. It provides the out-of-band communication path between the ESXi host and the storage array for VM/VVol tasks, as well as surfacing up the array capabilities to the vCenter Server. These capabilities allow administrator to create VM storage policies, which are then selected during the virtual machine provisioning process.
The VASA provider can take different forms; some array vendors have it embedded in the storage controller while others run it in a virtual appliance. In the context of VVols, the VASA provider should ideally be stateless. In other words, if it goes away for some reason, this should have no impact on your VM I/O. Yes, it will impact the ability to do certain operations (such as the creation of new VMs, creation of snapshots, etc) since the communication path between the ESXi hosts and the array has been impacted, but losing the VASA provider should not impact the current running VMs.
One option is to configure the VASA provider in a HA mode, so that one VASA provider acts as the primary and the other(s) act as backups. This is what we have done with VSAN which also uses a VASA provider. This would be a good question to ask your storage vendor; can your VASA provider be setup in a HA configuration? If not, then you would need to deploy out a new VASA provider should you lose the existing one.
Now, earlier in this post I said that “ideally” the VASA provider should be stateless. There should be no need to store anything on the VASA provider. However that is not to say that some storage array partners have not put something stateful into their VASA provider that would no longer make them stateless. To be the best of my knowledge, VMware has not provided strict guidelines on how the VASA provider should be implemented; we have left this up to the storage array partners. This is yet another VVol question to ask your storage vendor; “If I lose my VASA provider, will it impact my existing VVols?”.
The other query that I have been getting a lot is in relation to vCenter Server availability. The VVol information that is stored by vCenter server relates to the VM storage policies that the VMs are using. Remember these are the policies created by the administrator based on the capabilities surfaced up by the array. Again, if vCenter goes away, this has no impact on the running VMs. The policy information associated with the VM continues to be used by the VMs. The issue is that if you do not have a vCenter server backup and you deploy a new vCenter, then it will not have the policy information for your running VMs. So these policies will need to be recreated, which is possible, though it might be a little tedious/manual. Again, we have ways of doing this for VSAN, and I suspect it is the same for VVols (though I admit I have not tried this scenario). This is why it is a good idea to back up your vCenter Server.
Note that failures in either of these components should not lead to data loss in VVols. I/O to VVols does not depend on vCenter or the VASA Provider. These are components needed for configuration and management, but should not lead to any sort of data loss issues if they are not present. The paradigm with the VVol architecture is that the storage is the source of truth, whilst the VASA providers is your guide. We would hope that the storage vendors follow this paradigm with their respective implementations, but just like the VAAI implementations of the past, that will probably vary from vendor to vendor.
I was involved in an interesting discussion with customers (on the Veeam forums) who have done some testing of this scenario with VVols. They have reported back the following. One customer, using a DELL EQL 6110S array (FW 8.1.0), installed a vCenter server, the DELL VASA provider, created some containers/VVol datastores, and deployed some VMs/VVols. He then deleted the vCenter server AND the VASA provider. Next, he installed a brand new vCenter Server and VASA provider, added the ESXi hosts (which had the containers/VVol datastores) to the new vCenter, and finally connected the VASA provider to the vCenter server. As per the customer, he then rebooted vCenter server once more and the ESXi hosts could see all the previously created VVols, and vCenter could see and manage all of the VMs. Obviously the policies would need to be created in this scenario.
I also read about a customer who was using the beta version of vCenter server 6.0 to test VVols on his HP 3PAR array. When the GA version of vCenter server 6.0 became available, he deployed the brand new version of vCenter server 6.0, and was able to see all of the VVols and VMs.
I am aware of a warning provided by NetApp. They are recommending that the vCenter and VASA provider needs to be updated regularly. They go on to state that “If the vCenter Server or VASA Provider server goes down, you risk losing the entire VVOLs environment.” This suggests that there is something stateful being stored in either the vCenter Server, the VASA provider, or both. I would recommend discussing this in detail with your NetApp reps if you are planning on using VVols with NetApp.
If you are doing any testing with other storage in this area, I’d really like to hear about your experiences in the comments section of this post.
Great read Cormac.
VVol changes a lot in the storage stack – and that is an understatement, and we are in the early days of it. But with VMware licensing it at Standard we are going to see a lot more of it. Big thanks to Luca@Veeam and joergr for their discussions and real world testing of the Dell EqualLogic implementation.
For those of you who have not put their toes in the VVol waters, the VMware HOL has at least two relevant labs: the Dell Storage lab which includes a VVol module base on EqualLogic http://labs.hol.vmware.com/HOL/catalogs/lab/2058 and the VMware SDS http://labs.hol.vmware.com/HOL/catalogs/lab/2112 which has a good look at Storage Policy-Based Management across VVol and VSAN
Great read and many thanks, Cormac! Also many thanks to David! The experiment was fascinating. I am sure VVOLs will play an important role in the future of vSphere storage.
Thanks for making people aware of how this works. As far as HPE 3PAR goes, we are one of the only vendors that I’m aware of that has embedded the VASA Provider within our array instead of running it as an external entity. As a result we don’t need to really provide any HA mechanism as the array would have to be down for the VASA Provider to be down. We also have developed 3PAR OS CLI commands to interact with the VASA Provider on the array so you can start/stop it if needed and see the status of it.
Hi Cormac,
Great post and very timely. I have run quite a few technical design workshops and webex sessions with customers and your points are all valid considerations. VVol changes things up in terms of making previously passive objects into mission-critical components.
We support VVol on the entire Gx00 (G200/400/600/800/1000) range as well as on our recently launched All Flash Arrays (F400/600/800). These systems all run the same code base and use the same management tools (Hitachi Command Suite, HCS). I only mention this as it will become very important when further features and capability profiles are added to HCS/array microcode in the future, so these can be surfaced quickly to vCenter via the VASA provider, from any platform within the range. It is also important when managing VVol within the Hitachi context.
To help inform the community with some documents which are not just about Hitachi VVol implementation, here are a couple of really good resources:
This week we published the latest guidance for VVol deployment on Hitachi systems: bit.ly/1PREedW
We previously published a full blown planning guide to help customers understand how SPBM works: https://www.hds.com/assets/pdf/storage-policy-based-management-with-vvol-block-storage-on-hitachi-vsp-g1000.pdf
Hitachi has a stateless VASA provider (VP) implementation. If the VP dies, I/O is unaffected. Unlike HP we don’t deploy the VASA provider within the microcode of the array. I can’t speak 100% for product management on why the choice was made but I can say that even this week I spoke to customers with many HDS arrays (these are large scale with 000’s of VMs and many Block arrays and HNAS systems) and decoupling the VASA provider from the array provides a single point of management (single storage provider in vCenter). That makes it simpler to manage but does mean protecting it is now critical, as otherwise you cannot do normal things like Power on VMs, perform a snapshot etc
I believe there are merits to each approach and who can say if we might change in the future. For now, you configure a VASA provider which talks to Hitachi Command Suite (HCS) via HCS API to perform VVol calls. Each of these components must be running to perform VVol operations. Within the VASA provider, customers can decide whether to hide or show any system under HCS management.
V3.2 of Hitachi VASA provider has been released and we support a single VASA provider vehicle for Block and File storage systems. This means we manage the binary support for VVol File and Block together, and ship this as a single “product”. Right now this is two OVA’s shipped within one bundle; One for File, and one for Block, but the management has been completely unified within HCS. This means you can have a workload running in a VM automatically placed using SPBM across File and Block systems using a single VP and HCS combination. That’s quite powerful from a workload placement but more important from a manageability perspective.
Protecting the VASA provider (and the HCS instance) is a real consideration as you pointed out, and as per the first document above, customers should deploy this on VMFS to ensure it is out-of-band for VVol. Right now Hitachi suggests using Windows Failover Cluster for HCS and vSphere HA for the VASA provider. I can’t divulge our roadmap but expect that to change from an availability and recoverability perspective in the not too distant future. We have also clearly stated that a storage container will span arrays in the future which will provide redundancy from an endpoint perspective i.e. if one array is unavailable you can still place workloads to another pool, once SPBM rulesets are satisfied.
Finally, recoverability of VVol resources is a key consideration. Up to now if you lost HCS you could refresh the config from your arrays, and maybe backup the HCS database once a day. With VVol this element of the stack become critical as HCS does hold metadata, so this will change the way people need to think about designing this layer. I believe you cannot talk about HCS and VP SPOF’s in isolation and not consider vCenter SPOF. Especially with vSphere 6 and PSC considerations on top, and we see many customers looking at VVol as a foundation for Enterprise Private Cloud / Storage-as-a-service deployments, due to the on-demand creation and destruction of objects and operational simplicity.
I see customers now questioning the availability of the entire stack including Hitachi components, and considering all failure scenarios. They are right to do this to ensure recoverability in the event of any components within the stack.
Hope this helps !
Great article, Cormac!
We at Nimble Storage decided to deploy and maintain the VASA provider within our array controller. As our arrays are running >5×9’s within the field, it also means now the required integration for VVOLs are also now 5×9 (and easy to upgrade with the 1-click NimbleOS upgrade).
Important conversation, Cormac.
SolidFire VASA provider resides on system as well. Our VASA provider code can run on any of the nodes in our cluster providing extremely high availability and removing additional configurations to maintain availability. Can’t share much on details around our VVols implementation here at present. Our upcoming release will prove a great representation of all that is possible with VVols for both availability and high granular control.
Hi Cormac,
the NetApp warning about the loss of vCenter is related to the loss of storage policy configuration information specifically. We’ve reached out to them with a request to clarify this statement as it does sounds scary as currently written.
Thanks Ben – would be nice to have some clarification from them.
Peter from NetApp Here. I would like to provide some clarification on the questions about our VASA provider. First, We are off-box for VASA for several reasons, not the least of which is we have the ability to scale out to multiple arrays and platforms managed by a single VP instance. This becomes very important as customers move to multi-datacenter deployments and want ease of management. Second, We regret that the wording in the doc is overstated, and we will be refining it. What it should say is that If the VASA provider becomes unavailable, then at that point you will be unable to provision VMs, unable to do vMotion/svmotion management of them, etc. similar to the use cases that Cormac lists in the blog. The management plane has become unavailable, but the storage and current IO has not.
There are two scenarios for VASA provider unavailability: The first is if the VP goes down temporarily, but comes back up intact. In that case, while the VP is down, it will not be possible to manage VMs on VVols (create, power on, migrate (manual or automated by DRS), edit settings). In fact, the VMs will go greyed-out “inaccessible” in italics in the vSphere Web Client. However, already-running VMs will stay running until some other event occurs that takes VMs down. Once the VP comes back on line, the VMs displayed in the vSphere Web Client will return to normal and management operations will resume. The second case is where the VP is destroyed/deleted completely. We are confident in the resiliency of our VASA provider when deployed according to best practices around protecting the VASA Provider to ensure that this doesn’t happen. That being said, in the current implementation, it requires significant effort to recover from this if people don’t follow our practices. We are working on making the VASA Provider much easier to recover in such a case as well as providing additional resiliency for a few edge cases that we’ve discovered.
As a final thought, keep in mind that VVols is version 1.0 right now. While NetApp is leading the charge with the best VASA provider out there, there’s still room for all vendors to improve and help move the technology forward.
David Glynn raised a good tip on labs. It’s not in the catalog anymore (not sure why), but NetApp has shown a HOL for VVols at both VMworld 2013 and 2014. (We switched over to tier-1 application recovery lab in 2015). However, we currently offer VVols in our own Lab On Demand.