Windows 2008 application level quiescing is performed using a hardware snapshot provider. After quiescing the virtual machine, the hardware snapshot provider creates two redo logs per disk: one for the live virtual machine writes and another for the VSS and writers in the guest to modify the disks after the snapshot operation as part of the quiescing operations. The snapshot configuration information reports this second redo log as part of the snapshot. This redo log represents the quiesced state of all the applications in the guest. This redo log must be opened for backup with VDDK 1.2.
Even after reading the above, I wasn’t 100% clear on what we were doing or how we were doing it. I did a bit of reading and testing, as well as had numerous discussions to figure it out. To begin with, its important to understand that we are dealing with two snapshot related technologies – the VM snapshot and the shadow copy service from Microsoft. I’ll attempt to clarify which one I am talking about in this post by using these terms.
To describe VSS in a nutshell, vSphere makes a VSS request to each of the application’s VSS writers to “freeze” the application. This occurs when the quiesce option is chosen when taking a snapshot, and the guest OS supports VSS. Typically this is used to make a consistent backup of the application. At the same time, the VSS provider in the guest OS creates a shadow copy/snapshot of the application. Once the shadow copy is created, holding a consistent state of the application, writes can once again resume against the application. I found this explanation on VSS very informative.
However, because we are also taking a VM snapshot as well, we have no way of knowing if the shadow copy is completed. It is very conceivable that the shadow copy of the application has not completed when we take the VM snapshot. In other words, it is not possible to create a VM snapshot at the exact point when an application is considered consistent. But by taking a snapshot of the VM when it is running on traditional NFS/VMFS storage, the base disk is placed in a read-only and the running point of the VM becomes the snapshot. We cannot leave the base disk writable and have writable point-in-time snapshot in its chain. This is fundamental to the way snapshots work, as any writes to the base disk will corrupt any descendant snapshots in the chain. In a redo log hierarchy such as this, a snapshot must be considered “immutable” because its descendants in the chain depend on it to be “frozen in time”. This is due to the fact that these snapshot descendants in the chain inherit a set of unchanged blocks from its parent or parents. Writing to any link other than the bottom most link in a redo-log hierarchy will lead to data corruption. So how can we complete the creation of the shadow copy for the application, and guarantee an application consistent snapshot?
The traditional approach
The way this works is that first we will create a writable snapshot against the running point of the VM. This results in an otherwise normal snapshot consistent with the running point. However, we now require the application to commit all outstanding I/Os to the snapshot to put it in an application consistent state. To do this, the writable snapshot is reattached as a separate virtual disk to the VM.
Remember that this writable snapshot does not capture a point-in-time (PIT) delta of the guest OS. It is simply an artifact created on behalf of the VSS shadow copy service, and allows in-flight I/Os belonging to the application to be caught so that an application can be placed into a consistent state, typically for the purposes of backup. In traditional configurations (VMFS, NFS), the creation of two separate redo logs can be observed:
DISKLIB-LIB_CREATE : CREATE CHILD: "/vmfs/volumes/54217270-9baa46c8-20b3-e4115baa8e42/swizzle-yeah/ \ swizzle-yeah-000001.vmdk" -- vmfsSparse cowGran=1 allocType=0 policy='' . . DISKLIB-LIB_CREATE : CREATE CHILD: "/vmfs/volumes/54217270-9baa46c8-20b3-e4115baa8e42/swizzle-yeah/ \ swizzle-yeah-000002.vmdk" -- vmfsSparse cowGran=1 allocType=0 policy=''
This attaching the writable snapshot as a disk to the VM can be observed in the VM logs:
ToolsBackup: hot adding disk swizzle-yeah-000002.vmdk to node scsi0:1 . . HotAdd: Adding disk with mode 'persistent' to scsi0:1 . . ToolsBackup: successfully mounted writable snapshot in guest.
ToolsBackup: hot removing disk swizzle-yeah-000002.vmdk from node scsi0:1. . . Closing disk scsi0:1 . . ToolsBackup: Post-processing writable snapshot disk
Now we have an application consistent snapshot of the VM.
There were some issues with this approach, namely the inability to offload the snapshot operation to a VAAI-NAS array, as per this blog article about VSS and application level quiescing in Windows 2008. In the next section we will see how VVols addresses some of these issues.
The VVol approach
VVol snapshots changes this behaviour once again. If we cast our minds back to an earlier post I did on VVol snapshots, you may recall that VVol snapshots allows the VM to continue running on the base disk. There are no snapshot chains so to speak, as every snapshot is a point in time (PIT) copy based on the state of the base disk. Conceptually, you could consider VVol snapshots relationship to the VM as looking something as follows. Note that there is no chain of dependencies between the delta.
Once the VSS writers have done their thing, as before, the snapshot can be hot removed from the VM and the process is once again considered complete. The third snapshot will become just another PIT VVol snapshot, but it will once again be application consistent.
To reiterate, the reason we could not do this with traditional VM Snapshots on NFS and VMFS is because the snapshots is the change must remain “immutable”; i.e. they cannot change. This is because snapshots further down the chain rely on them to remain unchanged.
However, through VVols, we no longer need to create a second redo log just for VSS shadow copy services. This is a major change to how we’ve done snapshots in the past and another example of how VVols is making our snapshot process more efficient.