Virtual Volumes – A new way of doing snapshots
I learnt something interesting about Virtual Volumes (VVols) last week. It relates to the way in which snapshots have been implemented in VVols. Historically, VM snapshots have left a lot to be desired. So much so, that GSS best practices for VM snapshots as per KB article 1025279 recommends having on 2-3 snapshots in a chain (even though the maximum is 32) and to use no single snapshot for more than 24-72 hours. VVol mitigates these restrictions significantly, not just because snapshots can be offloaded to the array, but also in the way consolidate and revert operations are implemented.
Let’s start with how things work at the moment with the redo log format snapshots. When we snapshot a base disk, a child delta disk is created. The parent is then considered a point-in-time (PIT) copy. The running point of the virtual machine is now the delta. New writes by the VM go to the delta, reads are still satisfied by the base disk. It would look something like this:
Now when a consolidate operation is needed, in other words we wish to roll up all of the changes from delta-2 into delta-1, we need to redo all the changes into the deltas and update the base disk, as well as change the running point of the VM back to the base VMDK. You could do this by consolidating each snapshot, one at a time. There is of course another way to do it which is to consolidate the whole chain (delete-all). With a very long chain, it takes considerable effort to redo all of the changes in each of the snapshots in the chain to the base disk, especially when there are many snapshot deltas in the chain. Each delta’s set of changes needs to be committed in turn. Not only that, but when a whole chain is consolidated and the base disk is thinly provisioned, it may require additional space as snapshots changes are merged into the base.
One final item to highlight with the redo log format is the revert mechanism. This is quite straight forward with redo logs as we can simply discard the chain of deltas and return to a particular delta or base disk. In this example, we reverted to the base disk by simply discarding the snapshot deltas holding the changes made since we took the snapshots:
Now that we have a grasp of the basic concepts of the snapshot redo log format, lets turn our attention to the new VVol format. The first thing to remember is that with a VVol snapshot, you are always running off of the base disk. The VM no longer has its running point on a snapshot delta. The delta is responsible for maintaining a point-in-time (PIT) copy of the data which means that as the VM does I/O to the base disk, the delta is responsible for tracking the original data block. It would look something similar to the following:
Now things become a whole lot more interesting when it comes to consolidate and revert operations. A consolidate operation no longer means that every child snapshot in the chain has to be read and merged into its parent. For a consolidate operation on VVol snapshots simply means discarding the snapshot chain, as we have the latest and greatest information in the base disk.
Finally, lets look at a revert operation on a VVol snapshot. This entails going back to a particular point in time in the snapshot chain. In this case, we can consider this an undo operation as opposed to a redo operation – we must undo the changes in the base disk with the original blocks stored in a delta/point-in-time copy. This may look similar to the following:
We think that this behaviour is going to lead to major improvements in virtual machine snapshots performance. Since these VVol snapshots will also be offloaded to the array, there should be no restrictions and customers can utilize the full 32 snapshots-in-a-chain limit from vSphere. We refer to these as “managed snapshots” – of course the array itself can support much more than this. This enhancement will also mean that consolidate operations on managed snapshots (which are the most common use cases – think backups, etc) will be pretty instantaneous. Admittedly, the revert operation may be slower than a revert operation on redo log based snapshot, but this is most likely not a very common operation when compared to consolidate.
Note: I haven’t covered snapshots with memory, nor the effect of using VSS – Microsoft’s Volume Shadow Copy Service – on applications running in the guest OS. I also didn’t cover how vSphere can leverage “unmanaged snapshots” on the array. These will be topics for future posts.
27 Replies to “Virtual Volumes – A new way of doing snapshots”
This appears to be a typical Copy on Write based Snapshot. Will the mechanism be the same regardless of the underlying array or will different arrays implement it differently?
Array vendors can do whatever they want since all we are sending are API calls. But one hopes they’ll follow the guidelines that we laid out.
Nice article, Cormac. One point of clarification: Reads still have to check the most recent delta first, before propagating to any previous deltas or the base disk. This covers the case where you’re reading something that has only been written to the most recent delta.
Thanks Andy – I miss you not being here, and not being able to pick your brain about these things from time to time. Hope the new gig is going well!
If you have any questions, you still know how to get in touch with me. Feel free, any time.
Thanks for the info, Cormac. I guess the downside to this approach is on everyday writes, because the more snaps you have, the more deltas that need to be updated on every write, correct? Any change you make to the running VMDK will mean a change for every PIT. With offloading to deduped arrays, it’s probably not a big deal, because in theory you’re only updating metadata for each PIT.
Good point John – thanks for leaving that comment. Lots to think about.
nice post, lot of information
Thanks for sharing Cormac , can we say now snapshots are not problem for high tier 1 applications because main VM disk image always up2date, no more any change need to redo because no merge, looks like drawback could happen when customer need to back snapshot , because need to make undo , can we say undo have same weight like redo before ?
I guess the main point here Vahric is that the operations are offloaded. Arrays could alway do snapshots much better than we could at the vSphere level, so if your array provides enterprise snapshots, then your VM can now leverage those through VVols. So I would say this “undo” method is superior to the “redo” method since the array is doing most of the work.
Offloading is nice, but if the API handoff is to offload a lot of extra work, then no I don’t consider that a net positive. What I’m unclear on at this point is if arrays can use a RoW (pre-VVol snapshot) approach which is much less IOPS intensive at time of original write (usually when sensitivity to performance is higher)? If an array can use RoW, then great.
Otherwise, I see CoW for snapshots benefitting the short-lived backup window scenario at the cost of almost all other use cases (like forensics/troubleshooting where original VMDK is left undisturbed) or Dev/QA environments where snapshot trees are left in place for long periods of time. [Ok, CoW keeps data together for streaming performance (ex database) when you aren’t using a wide-stripped array as well]. It appears to me this assumes an array with lots of IOPS overhead (but how often is that the case today? should be better as we transition to SSDs, but most customers aren’t there yet)
Cormac – I appreciate the article, and your contributions to the community.
In this case, if I’m understanding correctly that VVols will switch snapshots from RoW to CoW, then it seems like a technical step backwards (and therefore I’m disappointed with VMware). Maybe a future enhancement would be to default to using CoW for backup API generated snapshots, but allow RoW for other scenarios??
So I’ve had this conversation with some folks already Lawrence, and you have some valid points for sure.
What I’d like to highlight however, are the issues we’ve had with our snapshots using CoW in the past. Once we get beyond a few snapshots in the chain, we start to suffer some severe read amplification. This (along with other overheads) is why we have placed some severe restriction on the use of our vmfsSparse format snapshots in production.
However, to answer your first question, yes, arrays can continue to use the CoW format for their native snapshots. VVols does not mandate that an array uses RoW snapshots. This is simply how we are managing VVol snapshots thru vSphere. But vSphere can also manage RoW snapshot implementations.
I raised your point about CoW benefits with our engineering team. The response was that your comments seem to be a more generic concern about the overhead of writing to a disk using a COW snapshot implementation versus a ROW snapshot implementation. This should not be too much of a concern these days since any modern storage system uses “metadata” to manage the mapping of logical blocks to physical blocks, not some hierarchy scheme, and likely not involving superfluous copies. As such “taking a snapshot” is just copying metadata, and creating “unshared blocks” (i.e. changing a block that was originally shared with a snapshot) tends to mean write somewhere else and update the metadata in kind, not copy the old data then write the new data in its place.
Hope this helps
So is there any advantage to the way snapshots work with VVols in a dev/test environment where users take multiple snaps per vm and move between them/delete them regularly?
Given that my storage array says we are 60% writes, would I see any benefits / increase in performance if I starting using VVols?
It depends on how you are doing your snapshots right now. Are you offloading snapshots to the array? Are you managing your snapshots from vSphere?
What VVols gives you is the ability to manage array-based snapshots from your vSphere client. And yes, you should see a marked improvement in snapshot performance when using snapshots on VVols in certain cases when compared to do traditional VM snapshots on VMFS and NFS:
1. the snapshot creation is offloaded to the array using VVols
2. you can simply discard a VVol snapshot when deleting it as opposed to merging/consolidating it to the base disk (one of the biggest issues with traditional snapshots).
Does Vmware snapshots using vvol create any transient redologs which are deleted later (like what happens with vaai as of 6.0)?
Just to clarify with VVOLs you can:
1. Get vSphere to create snaps and these will use CoW
2. You can offload the snap to the array and it will then use CoW or RoW or what ever technique it has available
In my experience it is wrong to say that arrays have high performance snapshots – solutions that use CoW tend to not be used widely as they can be a huge performance overhead on the array. On the other hand solutions that use RoW are used as the overhead is minimal.
For me the fact that vSphere now uses CoW is not a good thing, it may be progress compared to the Redo Logs, but it is still very much 90s technology.
I would have thought it would have used the same technology (Virtso) that has been used in VSAN to enable RoW snapshots.
Clearly there is a lot more to VVOLs than I realised, so do you know when a detailed architectural paper will be published that will give us all the information we need to fully get our heads around it?
the comment regarding high performance snapshots is in comparison to the VMware redo log snapshot format we have had up until now.
I’m not aware of an architectural paper in the works from VMware – certainly I expect array vendors to start publishing some once they are VVol ready.
Hey Cormac, thanks for the post. You and Yellow Bricks have been a great resource for my VSAN project in my school district. Me and my partner here in our DC are in opposing camps. He wants to replace the V-Block setup of Cisco UCS, Nexus, and EMC Celerra Clariion with Cisco UCS, Nexus, and a new Nimble Box.
I’ve spec’d out an expansion to my current 4 node VSAN to add another 4 nodes and I’m quite confident it’s not only insanely cheap, but far more practical. The one thing I’ve been missing is a backup solution. I’ve been looking at Snapshots for backups with something like Veeam. Is this an adequate solution, or is this more of a nice feature?
I know that VDP/VDPA from VMware can be used to backup VMs on VSAN. I’m sure there are other partners out there with backup products that can do the same, but I don’t know of any off hand. Probably best to reach out to the likes of Veaam, Symantec, Commvault, et al, for a definitive answer.
Groovy! Thanks Cormac, I appreciate the response.
No worries – by the way, I saw this from Veeam today:
Thanks – very informative.
One thing that I’m always puzzled in relation to the way the current (non-VVOL) way of doing snapshot is why there is a read / write performance when a snapshot is present.
I have not been able to get a consistent story on how it actually impacts the performance.
I understand the performance implication when the deltas are consolidated, where it would result in high IO and possibly long stun times, but how about the performance impact when it’s not consolidated?
Say I have a base vmdk which is frozen, and a delta which is growing with the number of writes.
The VM decides to read a block of data, ESXi then reads the sparse delta file – if that block is not in the delta, it would just read the base? (Granted, this is still 2 IO operations, but I’m assuming the sparse file metadata is small enough that it is cached by the storage controller).
Then in the case of an almost fully populated delta (where there has been writes to most blocks of the base disk) – If the block is in the delta, then wouldn’t ESXi just read the delta without touching the base, resulting in no increase in read IO compared to when there’s no snapshot?
How about writes? From others’ experience, there seems to be a write penalty associated when snapshots are present. When a VM makes a new write since the snapshot took place, wouldn’t the write go straight to the delta, or would it need to touch the base file?
What if the VM makes a repeat write to a block, would it just overwrite the delta or does it need to still refer to the base vmdk?
If someone could enlighten me, that would be really appreciated.
Comments are closed.