I had a query recently about changes to vSphere 6.0, especially when it comes to vSphere HA and Component Protection (VMCP) with vMSC, vSphere Metro Storage Cluster. The question is very straight forward – do all the same advanced setting recommendations for PDL and APD apply to vMSC on vSphere 6.0 as they did for vSphere 5.5? Or do we have some new recommendations now around PDL and APD for vMSC with the introduction of VMCP in vSphere 6.0?
With the release of VSAN 6.0, and the new all-flash configuration (AF-VSAN), I have received a number of queries around our 10% cache recommendation. The main query is, since AF-VSAN no longer requires a read cache, can we get away with a smaller write cache/buffer size?
Before getting into the cache sizing, it is probably worth beginning this post with an explanation about the caching algorithm changes between version 5.5 and 6.0. In VSAN 5.5, which came as a hybrid configuration only with a mixture of flash and spinning disk, cache behaved as both a write buffer (30%) and read cache (70%). If a read request was not satisfied by the cache, in other words there was a read cache miss, then the data block was retrieved from the capacity layer. This was an expensive operation, especially in terms of latency, so the guideline was to keep your working set in cache as much as possible. Since the majority of virtualized applications have a working set somewhere in the region of 10%, this was where the cache size recommendation of 10% came from. With hybrid, there is regular destaging of data blocks from write cache to spinning disk. This is a proximal algorithm, which looks to destage data blocks that are contiguous (adjacent to one another). This speeds up the destaging operations.
A few weeks, my good pal Cody Hosterman over at Pure Storage was experimenting with VAAI and discovered that he could successfully UNMAP blocks (reclaim) directly from a Guest OS in vSphere 6.0. VAAI are the vSphere APIs for Array Integration. Cody wrote about his findings here. Effectively, if you have deleted files within a Guest OS, and your VM is thinly provisioned, you can tell the array through this VAAI primitive that you are no longer using these blocks. This allows the array to reclaim them for other uses. I know a lot of you have been waiting for this functionality for some time. However Cody had a bunch of questions and reached out to me to see if I could provide some answers. After conversing with a number of engineers and product managers here at VMware, here are some of the answers to the questions that Cody asked.
Today VMware announces the Virtual SAN 6.0 Health Check Plugin, a feature that will check your Virtual SAN configuration, both proactively and re-actively, and highlight any abnormal conditions found in the cluster. This is available to all our VSAN customers right now. Not only does it check the health of the cluster, but it also checks the state of the network, host connectivity, physical disk status, and underlying virtual machine object state. This is a great tool for ensuring that an initial deployment of VSAN or proof-of-concept has been rolled out successful, giving you confidence in your VSAN deployment. It is also useful for ongoing monitoring and maintenance of your Virtual SAN cluster.
The more astute of you who have already moved to vSphere 6.0, and like looking at CLI outputs, may have observed some new columns/fields in the PSA claimrules when you run the following command:
# esxcli storage core claimrule list --claimrule-class=VAAI
The new fields are as follows (slide right to view full output):
XCOPY Use Array Reported Values XCOPY Use Multiple Segments XCOPY Max Transfer Size ------------------------------- --------------------------- ----------------------- false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0 false false 0
I had a few queries recently on how Virtual Volumes (VVols) worked with vSphere HA. In particular, I had a number of questions around whether or not VVol datastores could be used as a heartbeat datastore by vSphere HA. The answer is yes, the VVol datastore can be used for vSphere HA datastore heartbeating. If you want to see how, please read on.
I think these queries may have arisen due to the fact that we do not use datastore heartbeating with Virtual SAN (VSAN). Just by way of reminder, the master host in a vSphere HA cluster uses a heartbeat datastore when it can no longer communicate with a slave host over the management network. This allows the master to determine things like whether a slave host has failed and is down, or if it is in a network partition, or if it is network isolated. This then allows the master to make decisions about what to do with the VMs that reside on the slave host. If the slave host has stopped datastore heartbeating, it is considered to have failed and its virtual machines are restarted elsewhere in the cluster.
This isn’t possible in VSAN due to the fact that storage is local to a host, thus if a host is partitioned and it updates heartbeats on its local storage, there is no way for the other others in the VSAN cluster to see it. VVol datastore is shared storage, which is why datastore heartbeating works. Other hosts can see the updates on storage even if the host is off the network.
This is another nice new feature of Virtual SAN 6.0. It basically is a directive to VSAN to start re-balancing components belonging to virtual machine objects around all the hosts and all the disks in the cluster. Why might you want to do this? Well, it’s very simple. As VMs are deployed on the VSAN datastore, there are algorithms in place to place those components across the cluster in a balanced fashion. But what if a hosts was placed into maintenance mode, and you requested that the data on the host be evacuated prior to entering maintenance mode, and now you are bringing this node back into the cluster after maintenance? What about adding new disks or disk groups to an existing node in the cluster (scaling up)? What if you are introducing a new node to the cluster (scaling out)? The idea behind proactive re-balance is to allow VSAN to start consuming these newly introduced resources sooner rather than later.