Compare and Contrast – VSAN and VVols
Earlier this month I had the opportunity to meet with a number of VMware customers in both Singapore and in the UAE. Most of the sessions were enablement and education type sessions, where there was a lot of white-boarding of VSAN (VMware’s hyper-converged infrastructure product) and Virtual Volumes (VVols – Software Defined Storage or SDS for the storage arrays). This wasn’t a sales session; I’m not in sales. The objective of these sessions was simply to educate. I guess when you are immersed in this stuff 24×7, it easy to fall into the trap of believing that everyone is well versed in this technology, and that’s simply not the case.
With both virtualization teams and storage teams in the room at the same time, it was important to show the building blocks with each approach, as well as to compare and contrast the advantages of the different storage solutions over the other. As I repeatedly delivered the same session, I thought it might be useful to share my thoughts with a broader audience, in the guise of this blog post.
Let’s start with the building blocks for VVols (far more detail can be found in this earlier post). There are six distinct components.
- VASA Provider (VP). Responsible for surfacing up capabilities of underlying storage to vSphere and out of band communication for VVol operations (create, delete, snapshot, etc).
- Storage Policy Based Management (SPBM). Responsible for defining VM storage requirements via a policy, based on capabilities surfaced up by VP.
- Storage Container. Aggregated, abstracted pool of storage on the array.
- VVol Datastore. A representation of the Storage Container in vSphere.
- VVol(s). A new way of representing the virtual machine, which traditionally has been represented as a set of files.
- Protocol Endpoint (PE). Provides communication path between VVol and VM.
Now lets take a look at the VSAN building blocks. Note there are a lot of similarities, but less components when compared to VVols.
- VASA Provider (VP). Responsible for surfacing up capabilities of underlying storage to vSphere.
- Storage Policy Based Management (SPBM). Responsible for defining VM storage requirements via a policy, based on capabilities surfaced up by VP.
- VSAN Datastore. Aggregated/abstracted pool of storage using local storage on the ESXi hosts in the VSAN cluster.
- VSAN Objects. A new way of representing the virtual machine, which traditionally has been represented as a set of files.
So why SDS? What does Software Defined Storage give you? In the following section, I try to highlight this. Some benefits are geared towards VSAN, others are geared towards VVols, and some benefit are common across both.
1. End-to-End Performance Visibility
Well, for me, the first benefit of SDS is end-to-end visibility into the I/O path. We can see for the first time we can see I/O at the VM (IOPS, latency and throughput) and trace it all the way to the underlying physical storage. Historically we always had this I/O blender effect, with multiple VMs, with multiple different applications and different workloads with varying block sizes all landing on the same LUN or volume. It was almost impossible to figure out who was introducing high latency or being a noisy neighbour. Having had a support background, this was always a tough nut to crack. Typically it is the end-user calling the vSphere admin to complain about slowness around performance; the vSphere admin may typically observe some higher than average latency via the vSphere client and other tooling. But when the vSphere admin has the conversation with the storage admin, in many cases the storage admin will respond by stating that everything looks normal. The I/O blender effect! Now with the granularity of the VMDK being a first class citizen on both VSAN and VVols, we can now drill down and see what is going on with each individual VM. Storage is no longer a black box with SDS. With VSAN, we have new performance views in 6.2 that gives this visibility. With VVols, it will of course be up to the storage vendors, but with the DELL EQL implementation, I’ve seen the actual VVols being tagged with the name of the VM when they are displayed in the EQL storage UI – very neat indeed.
2. Freeing the storage admin from mundane tasks
The other nice aspect is that storage administrators no longer need to be engaged for menial tasks like provisioning new storage or growing existing storage when the vSphere admin needs to provision new VMs. Its not simply creating the new storage that they get tied down with, but figuring out which capabilities should they put on a LUN/Volume for the workload of that VM. For example, the storage admin needs to ask the vSphere admin “are you replicating the VM?”, “what are the performance requirements, and should it be on flash or tiered storage?”, “should it be thin provisioned?”, “what about snapshots?”. And on and on. With VVols, they can now focus their precious time on tasks like design and sizing storage containers, and determining the correct set of capabilities that should be surfaced up to vSphere so that a vSphere admin can consume them appropriately (via SPBM) for VM storage. SDS now presents both the large pool of abstracted storage and the underlying storage capabilities to the vSphere admin, without the storage admin having to get involved in every new provisioning operation. This is true for both VSAN and VVols, though technically a vSphere administrator should be able to configure the VSAN environment without any special input from a storage administrator.
3. More Granular VM Operations
Now we have the situation where the VM is a first class citizen in the storage world, both with VSAN and VVols. This means that we can do operations at the VMDK level rather than LUN or traditional Volume level, such as snapshots, replication and QoS. In the case of VVols, snapshots of a VM are always offloaded to the array as described in this post. We also don’t have to figure out the initial placement of a VM to make sure it gets the right capabilities. Now we simply make sure the VM has the correct policy containing the appropriate capabilities.
And yes, it is still true to say that there is a bit to do before we can orchestrate replication of VVols via VASA/SPBM, but replication can still be done via the array if it supports it (and we’re working hard to surface it up as a capability with VASA and consume it with SPBM). With VSAN, one can continue to use vSphere Replication orchestrated with Site Recovery Manager (SRM) should it be a requirement, and in VSAN 6.2, and RPO of 5 minutes (rather than the default of 15 minutes) can be offered. From a QoS perspective, the ability to place limits and thresholds on a per VM basis rather than the whole of a LUN is a characteristic of VVols that partners like SolidFire/NetApp continue to tout. We have only just started to add features like this to VSAN, with the introduction of the IOPS limit per object feature.
4. Simplicity
I already covered how VVols reduces complexity in my Value of VVols post here. To recap, your reducing the storage presentation aspect from between 10s and 100s of LUNs down to a handful of protocol endpoints (PEs). This means that the LUN mapping/masking operations are greatly reduced, as are the multipathing configuration tasks (MRU, Fixed, RR) on the vSphere side. On top of that, there is no filesystem formatting needed, e.g. creating VMFS. The other interesting comparison is upgrades. I spoke to some customers who were hampered in their vSphere upgrade process because of all of the moving parts in traditional storage environments, e.g. storage array controller firmware version, FC switch firmware version, HBA driver and firmware versions. All of this needed some updating before vSphere could be migrated to a later version. This would continue to be a consideration with VVols to be honest. With VSAN, with all of the smarts embedded into vSphere, it is really just a matter of upgrading vSphere. Yes, there are some considerations around the version of storage controller driver and firmware, but not nearly as many as there are for traditional storage deployments.
5. Scaling Up and Out
This is another advantage I would offer with VSAN and hyper-converged infrastructure. There is no need for a very large investment up front. Instead customers can start with a small deployment for a POC or small number of VMs, and then grow the size of the environment gradually. With VSAN, scale up can be done by simply adding more disks or replacing the existing disks with one with newer capacity, or can scale out by simply adding one node at a time. I know this isn’t always the case with other HCI vendors, but it is certainly an option with VSAN. With VVols, this may not always be possible as we are dealing with storage arrays, so one suspects that there needs be an initial (considerable) investment up front. However, from a scaling perspective, VVols does have an advantage over traditional storage which limits you still to 256 devices that can be presented to a host. With a few PEs, 1000s up 1000s on VVols can now be created on the Storage Container.
Conclusion
There is no right or wrong approach when it comes to SDS. HCI might be the best approach for some customers, and VVols + storage arrays might be the best approach for other customers. However for customers who are planning new green-field deployments, or have had some pain-points with virtualization/storage in the past, SDS should be a consideration over the traditional storage approach for some of the reasons highlighted above.
I’d be interested in hearing feedback from customers who have switched to SDS, either HCI or VVol capable arrays, and hearing your thoughts. What works well for you? What still needs improvement?
Did you discuss ‘run’ away processes that could fill up a vVOL container?
In today’s world or traditional setup, for example a 200 TB storage group is setup along with 100 two TB luns provisioned to a set of hosts. This is what we did today.
If we migrate to vVOLs, we partition a 200 TB container and then all machines use this single container, any discussion of monitoring runaway processes that could potentially fill up the entire 200 TB vs a 2 TB LUN? We’ve had 2 TB occasionally fill up for crazy unknown things. It only impacts the vms on that LUN, in a container, all boxes would be impacted correct?
Just wondering how people are designing their containers to take advantage of the ease of use vs. protecting themselves from klll a bunch of VMs if a container runs out of space.
Are you talking about a situation where a VMDK is thin provisioned Steve, and an application in the VM writes to the whole VMDK? Or maybe you are talking about an RDM. I can’t think of another situation where a runaway process can write to the whole LUN unless it has been given access to it in some way.
VVols are similar. You’re not going to be able to have a VM write to the whole of the storage container unless the VMDK/VVol in question is made so large as to consume all of the container.
Otherwise I cannot see this as a concern.
How about a snapshot taken by a backup product running a LUN out of space where all of VMDKs are thick provisioned?
On VVOLS when a snapshot is taken is that in the storage container or outside of the storage container? Does it very by storage vendor?
Well, a snapshot should not grow any bigger than the original VMDK. Even if every block of the original VMDK got changed, the size of the snapshot would still only equal the VMDK size. So on a LUN, if there was not enough space, then yes, a snapshot could fill it.
And yes, in theory, if you had a very small storage container, or a very full storage container, a runaway snapshot could fill it. Typically though, the storage container will be much much larger than your original LUN, so the issue is mitigated somewhat.
But I agree. Many of the same operational aspects will still need to be considered with VVols.
We are going through this type of thing and it’ll be a lot easier when “needing more space”. The initial thinking is using a workload type design when it comes to vVOLs, sql, general, etc.. We don’t want to get too detailed and over architect. It’s interesting how fast stuff is changing and trying to adapt while keeping good practices in place. Thanks for the response and good post.
Cheers Steve.