Regular readers will be aware that I have been spending a lot of my time on Cloud Native Storage topics these days, whether it is bubbling up how Kubernetes clusters are consuming vSphere storage through our new CNS feature in vSphere 6.7U3, or using Velero to do lots of things like backups/restores/application mobility. However something I have been passionate about for quite a number of years now is our Virtual Volumes (vVols) feature. And while it has been rather quiet over the past couple of years, I was thrilled to see us deliver a tech preview for supporting Site Recovery Manager (SRM) to orchestrate testing, fail-over and fail-back of virtual machine workloads using vVols at VMworld 2019. I finally found some time to look at this in more detail, so I wanted to share that with you in this post.
As it is some time since I visited these topics on this blog, it is probably worth providing an overview of the products involved.
What is Site Recovery Manager (SRM)?
SRM orchestrates the ability to do a test fail-over, full Disaster-Recovery (DR) fail-over, fail-back and planned migration of virtual machine workloads from a protected vSphere ‘site’ managed by one vCenter server to a recovery vSphere ‘site’ managed by a completely different vCenter server. The testing portion allows organizations to be confident that the configurations between sites are in sync, should a real fail-over be required in the case of a DR event. It also gives organization a good idea on the sorts of Recovery Time Objective (RTO) that is achievable in the event of a DR.
Of course, there is also the actual fail-over between sites. This is done through the click of a button in SRM. Once you have defined what you are protecting in your ‘Protection Group’, SRM executes the ‘Recovery Plan’ and does the following in the event of an fail-over event:
- Stops replication at the protected site
- Attempt to shutdown VMs at the protected site
- Promotes the replicated copy of the data at the recovery site
- Attaches the promoted storage to the ESXi hosts at the recovery site
- Powers on virtual machines at the recovery site and modifies their IP addresses as needed
- Generates a report about the recovery process
SRM can manage bi-directional migrations, and can also be used to fail-back to the original protected site. In a fail-back event, SRM does the following:
- Applications are ‘re-protected’ on the protected site by reversing replication
- The original recovery plan is executed in the reverse direction so admins do not need to create a new recovery plan for fail-back
What are vSphere Virtual Volumes (vVols)?
In a nutshell, vVols allow virtual machines to be represented as first class citizens on a storage array. In other words, each of the constituent parts of a virtual machine are represented by a unique object on the storage array. This also means that array based features, e.g. snapshot, clone, replication, can be done at a per-VM or even per-VMDK granularity. These array based features are leveraged by virtual machines through the use of Storage Policy Based Management. SPBM allows us to choose a particular feature-set of the array based features by placing them into a policy. When a VM is provisioned, or at any point through the life-cycle of the VM, attaching a policy will allow the VM to consume the native array-based features defined in the policy.
The bi-directional communication between the vSphere infrastructure and the array is achieved through a VASA provider, short for vSphere APIs for Storage Awareness. It presents the array’s capabilities to vSphere so that policies can be built, but it is also how the vSphere infrastructure requests the array to create virtual volumes.
These are the sorts of virtual machine objects that would be backed by vVols on a vVol capable array.
- CONFIG – Stores the VM’s vmx, logs, nvram, log files
- DATA – VMDKs – virtual machine disks (base, snapshot deltas)
- SWAP – virtual machine swap files
- MEM – virtual machine snapshot memory
- Other – vSphere solution specific type
SRM and vVols integration (Tech Preview)
vVols has the concept of Replication Groups. You can think of a Replication Group as a group of replicated storage devices to provide atomic fail-over for an application. We can extend this to have Replication Groups also define the set of vVols. The vVols are maintained in write-order fidelity where writes are replicated on the recovery site in the exact same order they’re generated at the protected site. This ensures that at any time the recovery site has (at the very least) a crash-consistent version of the data that is on the protected site.
In the tech preview, Site Recovery Manager is planning to introduce protection and orchestration of vVol Replication groups.
Now, this is the most interesting part: this is all configured via SPBM once again. And once you have a policy configured with a vVol Replication Group (that is included as part of the SRM Protection Group), all a VI-admin needs to do is to associate a vVol policy with a VM, for example when you migrate it to a vVol array. Once you choose the Replication Group (which maps to a replication configuration on the storage array), that VM is placed in the correct configuration on the array and becomes immediately replicated and protected. Here is a screenshot provided by my good pal, Cato Grace, showing such a scenario.
Here we are simply moving a VM to the vVol datastore on the array, selecting an appropriate vVol policy and because it has replication configured, we can pick an appropriate Replication Group from the drop down. These different groups may have varying Recovery Point Objectives (RPO) or Recovery Time Objectives (RTO). Very cool indeed!
Fail-over, fail-back and planned migration can now be done on this VM through SRM. Automatic protection by simply assigning a policy to a VM!
For more information, please check out the recording of HCI2894BU – Tech Preview of SRM and vVols at VMworld 2019 in San Francisco .
Please note that this is a tech previews and as such, there is no guidance given about which future version of vSphere will include these products/features. Also, there is no commitment or obligation that technical preview features will become generally available.