When and why do we “stun” a virtual machine?

This is a question that seems to come up regularly, but I don’t think it appears in any great detail in external facing documentation. The question is “when do we stun (or in other words, quiesce) virtual machines”, why do we do it, and more importantly, how long can a stun operation take? One of our staff engineers, Jesse Pool, put together some really good explanations around the VM stun operation, which I am leveraging for this post. I took some particular interest in this as I wrote a bunch of snapshot posts recently around Virtual Volumes (VVols) so I…

Virtual Volumes (VVols), vSphere HA and Heartbeat Datastores

I had a few queries recently on how Virtual Volumes (VVols) worked with vSphere HA. In particular, I had a number of questions around whether or not VVol datastores could be used as a heartbeat datastore by vSphere HA. The answer is yes, the VVol datastore can be used for vSphere HA datastore heartbeating. If you want to see how, please read on. I think these queries may have arisen due to the fact that we do not use datastore heartbeating with Virtual SAN (VSAN). Just by way of reminder, the master host in a vSphere HA cluster uses a…

VSAN 6.0 Part 9 – Proactive Re-balance

This is another nice new feature of Virtual SAN 6.0. It basically is a directive to VSAN to start re-balancing components belonging to virtual machine objects around all the hosts and all the disks in the cluster. Why might you want to do this? Well, it’s very simple. As VMs are deployed on the VSAN datastore, there are algorithms in place to place those components across the cluster in a balanced fashion. But what if a hosts was placed into maintenance mode, and you requested that the data on the host be evacuated prior to entering maintenance mode, and now…

VSAN 6.0 Part 8 – Fault Domains

One of the really nice new features of VSAN 6.0 is fault domains. Previously, there was very little control over where VSAN placed virtual machine components. In order to protect against something like a rack failure, you may have had to use a very high NumberOfFailuresToTolerate value, resulting in multiple copies of the VM data dispersed around the cluster. With VSAN 6.0, this is no longer a concern as hosts participating in the VSAN Cluster can be placed in different failure domains. This means that component placement will take place across failure domains and not just across hosts. Let’s look…

Heads Up! ATS Miscompare detected between test and set HB images

I’ve been hit up this week by a number of folks asking about “ATS Miscompare detected between test and set HB images” messages after upgrading to vSphere 5.5U2 and 6.0. The purpose of this post is to give you some background on why this might have started to happen. First off, ATS is the Atomic Test and Set primitive which is one of the VAAI primitives. You can read all about VAAI primitives in the white paper. HB is short for heartbeat. This is how ownership of a file (e.g VMDK) is maintained on VMFS, i.e. lock. You can read…

Virtual Volumes (VVols) and Replication/DR

There have been a number of queries around Virtual Volumes (VVols) and replication, especially since the release of KB article 2112039 which details all the interoperability aspects of VVols. In Q1 of the KB, the question is asked “Which VMware Products are interoperable with Virtual Volumes (VVols)?” The response includes “VMware vSphere Replication 6.0.x”. In Q2 of the KB, the question is asked “Which VMware Products are currently NOT interoperable with Virtual Volumes (VVols)?” The response includes “VMware Site Recovery Manager (SRM) 5.x to 6.0.x” In Q4 of the KB, the question is asked “Which VMware vSphere 6.0.x features are…

VSAN 6.0 Part 7 – Blinking those blinking disk LEDs

Before I begin, this isn’t really a feature of VSAN so to speak. In vSphere 6.0, you can also blink LEDs on disk drives without VSAN deployed. However, because of the scale up and scale out features in VSAN 6.0, where you can have very many disk drives and very many ESXi hosts, being able to identify a drive for replacement becomes very important. So this is obviously a useful feature. And of course I wanted to test it out, see how it works, etc. In my 4 node cluster, I started to test this feature on some disks in…