VM Snapshots with VSS – Traditional versus VVols

VVolsIn some previous posts, I highlighted how VVols introduces the concept of “undo” format snapshots where the VM is always running on the base disk. I also mentioned that this has a direct impact on the way that we do snapshots on VMs that support VSS, the Microsoft Volume Shadow Copy Service. But before getting into the detail regarding how VVols is different, it’s worth spending some time understanding whats going on when VSS is called to quiesce applications when a traditional snapshot is taken. If you try to research this yourself, you’ll find that there is very little information describing what is going on. The best place I found  this behaviour described is in the Designing Backup Solutions for VMware vSphere vStorage APIs for Data Protection 1.2:

Windows 2008 application level quiescing is performed using a hardware snapshot provider. After quiescing the virtual machine, the hardware snapshot provider creates two redo logs per disk: one for the live virtual machine writes and another for the VSS and writers in the guest to modify the disks after the snapshot operation as part of the quiescing operations. The snapshot configuration information reports this second redo log as part of the snapshot. This redo log represents the quiesced state of all the applications in the guest. This redo log must be opened for backup with VDDK 1.2.

Even after reading the above, I wasn’t 100% clear on what we were doing or how we were doing it. I did a bit of reading and testing, as well as had numerous discussions to figure it out. To begin with, its important to understand that we are dealing with two snapshot related technologies – the VM snapshot and the shadow copy service from Microsoft. I’ll attempt to clarify which one I am talking about in this post by using these terms.

To describe VSS in a nutshell, vSphere makes a VSS request to each of the application’s VSS writers to “freeze” the application. This occurs when the quiesce option is chosen when taking a snapshot, and the guest OS supports VSS. Typically this is used to make a consistent backup of the application. At the same time, the VSS provider in the guest OS creates a shadow copy/snapshot of the application. Once the shadow copy is created, holding a consistent state of the application, writes can once again resume against the application. I found this explanation on VSS very informative.

However, because we are also taking a VM snapshot as well, we have no way of knowing if the shadow copy is completed.  It is very conceivable that the shadow copy of the application has not completed when we take the VM snapshot. In other words, it is not possible to create a VM snapshot at the exact point when an application is considered consistent. But by taking a snapshot of the VM when it is running on traditional NFS/VMFS storage, the base disk is placed in a read-only and the running point of the VM becomes the snapshot. We cannot leave the base disk writable and have writable point-in-time snapshot in its chain. This is fundamental to the way snapshots work, as any writes to the base disk will corrupt any descendant snapshots in the chain. In a redo log hierarchy such as this, a snapshot must be considered “immutable” because its descendants in the chain depend on it to be “frozen in time”. This is due to the fact that these snapshot descendants in the chain inherit a set of unchanged blocks from its parent or parents. Writing to any link other than the bottom most link in a redo-log hierarchy will lead to data corruption. So how can we complete the creation of the shadow copy for the application, and guarantee an application consistent snapshot?

The traditional approach

vss snapshot 1This is where the second redo log comes in. This allows the creation of the shadow copy to complete, and then allow the application to be “unfrozen/unquiesced”. Therefore this additional snapshot must be made writable, and allow changes.

The way this works is that first we will create a writable snapshot against the running point of the VM. This results in an otherwise normal snapshot consistent with the running point. However, we now require the application to commit all outstanding I/Os to the snapshot to put it in an application consistent state. To do this, the writable snapshot is reattached as a separate virtual disk to the VM.

Remember that this writable snapshot does not capture a point-in-time (PIT) delta of the guest OS. It is simply an artifact created on behalf of the VSS shadow copy service, and allows in-flight I/Os belonging to the application to be caught so that an application can be placed into a consistent state, typically for the purposes of backup. In traditional configurations (VMFS, NFS), the creation of two separate redo logs can be observed:

DISKLIB-LIB_CREATE : CREATE CHILD: 
"/vmfs/volumes/54217270-9baa46c8-20b3-e4115baa8e42/swizzle-yeah/ \
swizzle-yeah-000001.vmdk" -- vmfsSparse cowGran=1 allocType=0 policy=''
.
.
DISKLIB-LIB_CREATE : CREATE CHILD: 
"/vmfs/volumes/54217270-9baa46c8-20b3-e4115baa8e42/swizzle-yeah/ \
swizzle-yeah-000002.vmdk" -- vmfsSparse cowGran=1 allocType=0 policy=''

This attaching the writable snapshot as a disk to the VM can be observed in the VM logs:

ToolsBackup: hot adding disk swizzle-yeah-000002.vmdk to node scsi0:1
.
.
HotAdd: Adding disk with mode 'persistent' to scsi0:1
.
.
ToolsBackup: successfully mounted writable snapshot in guest.
What you effectively end up with is a configuration similar to the following when you take a snapshot of a VM and ask for the applications to be quiesced within the Guest OS:

vss snapshot 2Once the writable snapshot is attached to the VM, all I/Os that are needed to put the snapshot in a consistent state are issued to the writable snapshot. Once this process completes, the writable snapshot is removed from the VM and the process is considered complete.

ToolsBackup: hot removing disk swizzle-yeah-000002.vmdk from node scsi0:1.
.
.
Closing disk scsi0:1
.
.
ToolsBackup: Post-processing writable snapshot disk

Now we have an application consistent snapshot of the VM.

There were some issues with this approach, namely the inability to offload the snapshot operation to a VAAI-NAS array, as per this blog article about VSS and application level quiescing in Windows 2008. In the next section we will see how VVols addresses some of these issues.

The VVol approach

VVol snapshots changes this behaviour once again. If we cast our minds back to an earlier post I did on VVol snapshots,  you may recall that VVol snapshots allows the VM to continue running on the base disk. There are no snapshot chains so to speak, as every snapshot is a point in time (PIT) copy based on the state of the base disk. Conceptually, you could consider VVol snapshots relationship to the VM as looking something as follows. Note that there is no chain of dependencies between the delta.

VVol VSS 1Now consider a snapshot request which include a quiesce request against the applications. With VVols using this snapshot functionality,  there is no reason to take an additional snapshot just for VSS and its writers to commit outstanding I/O for application consistency. The reason for this is that we can present a delta, point in time (PIT) snapshot back to the VM as a writable snapshot, without impacting any of the other PITs in the snapshot chain (as there is no chain per-se). Taking the above example, lets assume that a third VVol snapshot is requested, and the request includes a requirement for the applications to be consistent. This invokes the VSS. The PIT snapshot is taken, and the the “swizzling” process takes place to re-parent the snapshot. As discussed, with VVol there are not chains, but every snapshot points back to the parent:

VVOL VSS Step 1But then as part of the process, this PIT snapshot is presented back to the VM so that the VSS writers can complete the shadow copy/snapshot:

VVOL VSS Step 2

Once the VSS writers have done their thing, as before, the snapshot can be hot removed from the VM and the process is once again considered complete. The third snapshot will become just another PIT VVol snapshot, but it will once again be application consistent.

To reiterate, the reason we could not do this with traditional VM Snapshots on NFS and VMFS is because the snapshots is the change must remain “immutable”; i.e. they cannot change. This is because snapshots further down the chain rely on them to remain unchanged.

However, through VVols, we no longer need to create a second redo log just for VSS shadow copy services. This is a major change to how we’ve done snapshots in the past and another example of how VVols is making our snapshot process more efficient.

14 comments
  1. 1. What happen if a write IO being replicated to the cache tier on remote host while there’s network/controller temporary issue at the moment (but not broken permanently), does this IO ops have to wait there before acknowledging to client? If this IO is waiting, will it cause the congestion to subsequent IOs in the queue and latency longer?

    2. When rsync, does VSAN throttle CPU utilization for IO movement? I read that CPU overhead is <10%, wondering is it the same rule when rsync occurs

    3.

    • These appear to be VSAN queries and not VVol queries, correct?

      Regarding 1, no we do not wait. If there is a failure, the components is marked as absent/degraded, the object is reconfigured and if still valid, I/O resumes to the remaining component(s). This behavior is described in detail in the Troubleshooting Reference Manual in the section describing failures. There is a link to it under the publications tab on this site.

      Regards 2, to the best of my knowledge there is no CPU throttling. The design is to not consume more than 10% of CPU with VSAN. There may be additional latency incurred for the VM I/O during a resync/rebuild operation, but it is a “it depends” type answer.

  2. Hi Cormac,

    Great article !

    Couple of clarifications:

    “Once this process completes, the writable snapshot is removed from the VM and the process is considered complete.”

    To confirm, the writable snapshot is removed from the VM but not deleted.

    Any subsequent operations (like clone of an online VM or restore to this snapshot) – will they use this writable snapshot or will they use the VMDK base.

    In the Vvol case (or the VMFS case), will the writable snapshot be made read-only once it is removed from the VM ?

    • Yes – we simply remove the snapshot from the VM (for a short time it is added as a disk to the VM). The snapshot is not deleted.

      In the case of VVols, the VM continues to run off of the base disk, so any further snapshots/clones are against the base.

      In the VVols case, the writable snapshot isn’t created as a separate entity since the VM is run off of the base disk, so we simply use this.

      • > Any subsequent operations (like clone of an online VM or restore to this snapshot) – will they use this writable snapshot or will they use the VMDK base.

        Hi Cormac,

        Thanks for the reply. I realized I didn’t word this question properly – specifically “subsequent operations” part.

        What I meant to ask was, if a VM snapshot is created as part of a VM clone operation of an online VM, would the clone be based off of VMDK-002 or VMDK base.

        Referring to the VMDK names in http://cormachogan.com/wp-content/uploads/2015/03/vss-snapshot-2.png

        In other words, is VMDK base the “application consistent snapshot of the VM” or is it VMDK-002 ?

        “In the VVols case, the writable snapshot isn’t created as a separate entity since the VM is run off of the base disk, so we simply use this.”

        Referring to http://cormachogan.com/wp-content/uploads/2015/03/VVol-VSS-2.png

        What I meant to ask was, when does VMDK-003 which is shown as RW in this image, become RO (like VMDK-002 and VMDK-001) ? The VASA apis do not seem to describe this state transition.

        Thanks again !

        • Ah – sorry. I misunderstood Raj.

          How clones behave is the next thing that I need to look at. I haven’t started that yet.

          However, considering that a VVol snapshot always runs off of the base disk even when a snapshot is taken, any subsequent clones would be off of the base disk which is the current running point of the VM.

          With regards to the diagrams, I hope they do help explain what happens. What isn’t shown is the snapshot “swizzling” where the base and the delta switch parenting. This is what allows every snapshot to be a direct child of the base. So the snapshot is taken of the base disk (VMDK-003), but then the “swizzling” takes place so that the VM goes back to running off of the base rather than the snapshot. Once the in-flight I/Os are written, that snapshot should go RO.In fact, I might add that as two separate diagrams after all.

Comments are closed.