I’ve been hit up this week by a number of folks asking about “ATS Miscompare detected between test and set HB images” messages after upgrading to vSphere 5.5U2 and 6.0. The purpose of this post is to give you some background on why this might have started to happen.
In vSphere 5.5U2, we started using ATS for maintaining the heartbeat. Prior to this release, we only used ATS when the heartbeat state changed. For example, referring to the older blog, we would use ATS in the following cases:
Acquire a heartbeat
Clear a heartbeat
Replay a heartbeat
Reclaim a heartbeat
We did not use ATS for maintaining the ‘liveness’ of a heartbeat. This is the change that was introduced in 5.5U2 and which appears to have led to issues for certain storage arrays.
There is a new snapshot format introduced in VSAN 6.0 called vsanSparse. These replace the traditional vmfsSparse format (redo logs). The vmfsSparse format was used when snapshots of VMs were taken in VSAN 5.5, and are also the format used when a snapshot is taken of a VM residing on traditional VMFS and NFS. The older vmfsSparse format left a lot to be desired when it came to performance and scalability. This KB article from our support team, indicating that no snapshot should be used for more than 72 hours, and snapshot chains should contain no more than 2-3 snapshots, speaks for itself.
One of the most common issues I got questions about in VSAN 5.5 was “why is VSAN deploying thick disks, when all of the documentation stated that VSAN deploys thin disks”?
The answer was quite straight forward, and was due to the fact that the VMs were being deployed without a VM Storage Policy. This meant that it went through the standard VM deployment wizard which offered administrators the option of thin, lazy-zeroed thick (LZT) and eager-zeroed thick (EZT). The default option is LZT, which if you just do click-click-click (just like I do) when deploying a VM, then you end up deploying a LZT format VM, even on the VSAN datastore. I described this issue in this older blog post. Its only when you select an actual VM Storage Policy when deploying a VM that VSAN uses the Object Space Reservation capability, which by default is 0%, meaning that the VM is effectively thinly provisioned. We realized that this was causing some issues for customers so we improved this whole deployment mechanism in 6.0 with the introduction of Datastore Default policies.
I learnt something interesting about Virtual Volumes (VVols) last week. It relates to the way in which snapshots have been implemented in VVols. Historically, VM snapshots have left a lot to be desired. So much so, that GSS best practices for VM snapshots as per KB article 1025279 recommends having on 2-3 snapshots in a chain (even though the maximum is 32) and to use no single snapshot for more than 24-72 hours. VVol mitigates these restrictions significantly, not just because snapshots can be offloaded to the array, but also in the way consolidate and revert operations are implemented.
Recently I published an article on Virtual Volumes (VVols) where I touched on a comparison between how migrations typically worked with VAAI and how they now work with VVols. In the meantime, I managed to have some really interesting discussions with some of our VVol leads, and I thought it worth sharing here as I haven’t seen this level of detail anywhere else. This is rather a long discussion, as there are a lot of different permutations of migrations that can take place. There are also different states that the virtual machine could be in. We’re solely focused on VVols here, so although different scenarios are offered up, I highlight what scenario we are actually considering.
The embargo on what’s new in vSphere 6.0 has now been lifted, so we can now start to discuss publicly about new features and functionality. For the last number of months, I’ve been heavily involved in preparing for the Virtual SAN launch. What follows is a brief description of what I find to be the most interesting and exciting of the upcoming features in Virtual SAN 6.0. Later on, I will be following up with more in-depth blog posts on the new features and functionality.
I have been doing a bunch of stuff around disaster recovery (DR) recently, and my storage of choice at both the production site and the recovery site has been VSAN, VMware Virtual SAN. I have already done a number of tests already with products like vCenter Server, vCenter Operations Manager and NSX, our network virtualization product. Next up was VCO, our vCenter Orchestrator product. I set up vSphere Replication for my vCO servers (I deployed them in a HA configuration) and their associated SQL DB VM on Friday, but when I got in Monday morning, I could not log onto my vCenter. The problem was that my vCenter was running on VSAN (a bit of a chicken and egg type situation), so how do I troubleshoot this situation without my vCenter. And what was the actual problem? Was it a VSAN issue? This is what had to be done to resolve it.