This is something I only learnt about very recently, and something I was unaware of. It seems that we have made a major improvement to the way we do snapshot consolidation in vSphere 6.0. Many of you will be aware of the fact that when they VM is very busy, snapshot consolidation may need to go through multiple iterations before we can successfully complete the consolidation/roll-up operation. In fact, there are situations where the snapshot consolidation operation could even fail if there is too much I/O.
What we did previously is used a helper snapshot, and redirected all the new I/Os to this helper snapshot while we consolidated the original chain. Once the original chain is consolidated, we then did a calculation to see how long it would take to consolidate the helper snapshot. It could be that this helper snapshot has grown considerably during the consolidate operation. If the time to consolidate the helper is within a certain time-frame (12 seconds), we stunned the VM and consolidated the helper snapshot into the base disk. If it was outside the acceptable time-frame, then we repeated the process (new helper snapshot while we consolidated original helper snapshot) until the helper could be committed to the base disk within the acceptable time-frame.
In Virtual SAN 6.0, a new snapshot format was introduced called vsanSparse. This improves snapshot functionality by leveraging the new VirstoFS on-disk format used with VSAN 6.0. I had a question recently about what would happen if I migrated a VM with a traditional vmfsSparse/redo log type snapshot. The question was whether or not it would be converted to the new vsanSparse format. Similarly, what if a VM with a vsanSparse snapshot was migrated from VSAN to a traditional VMFS/NFS datastore? Would it also be converted between formats? I decided that the only way was to try it out.
A short post today, but it highlights what I feel is an important enhancement to vSphere licensing. I’ve had lots of questions recently about why VAAI (Storage APIs for Array Integration) is not available in the standard edition of vSphere. This is especially true since I began posting about Virtual Volumes earlier this year, and it was clear that Virtual Volumes is available in the standard edition. One reason why this was confusing is that if a migration of a VVol could not be handled by the array using the VASA APIs, the migration would fall back to using VAAI offload primitives. But if you only had standard licensing for VVols, would you still be supported?
This is a question that seems to come up regularly, but I don’t think it appears in any great detail in external facing documentation. The question is “when do we stun (or in other words, quiesce) virtual machines”, why do we do it, and more importantly, how long can a stun operation take? One of our staff engineers, Jesse Pool, put together some really good explanations around the VM stun operation, which I am leveraging for this post. I took some particular interest in this as I wrote a bunch of snapshot posts recently around Virtual Volumes (VVols) so I think this fits in quite nicely. A “stun” operation means we pause the execution of the VM at an instruction boundary and allow in-flight disk I/Os to complete. The stun operation itself is not normally expensive (typically a few 100 milliseconds, but it could be longer if there is any sort of delay elsewhere in the I/O stack).
Recently I published an article on Virtual Volumes (VVols) where I touched on a comparison between how migrations typically worked with VAAI and how they now work with VVols. In the meantime, I managed to have some really interesting discussions with some of our VVol leads, and I thought it worth sharing here as I haven’t seen this level of detail anywhere else. This is rather a long discussion, as there are a lot of different permutations of migrations that can take place. There are also different states that the virtual machine could be in. We’re solely focused on VVols here, so although different scenarios are offered up, I highlight what scenario we are actually considering.
We made a number of enhancements to Storage DRS in vSphere 6.0. This article will discuss the changes and enhancements that we have made. There is a white paper which discusses many of the previous limitations of Storage DRS interoperability and I’d recommend reviewing it. Although a number of years old, it highlights many of the Storage DRS interoperability concerns. As you will see, a great any of these have now been addressed, along with some pretty interesting feature enhancements.
I watched a very cool demonstration this morning from the All Flash Array vendor, SolidFire. I spoke with SolidFire at the end of last year, and did a blog post about them here. One of the most interesting parts of our conversation last year was how SolidFire’s QoS feature and VMware’s Storage I/O Control (SIOC) feature could inter-operate. In a nutshell, QoS work at the datastore/volume layer whereas SIOC deals with the VM/VMDK layer. Last week, Aaron Delp and Adam Carter of SolidFire did an introduction to QoS, both on vSphere and on the SolidFire system. And they also did one of the coolest demos that I’d seen in some time, namely how they have managed to get SIOC and QoS to work in tandem.