This is a new feature in vSphere 6.0 that I only recently became aware of. Prior to vSphere 6.0, all the I/Os from a given virtual machine to a particular device would share a single I/O queue. This would result in all the I/Os from the VM (boot VMDK, data VMDK, snapshot delta) queued into a single per-VM, per-device queue. This caused I/Os from different VMDKs interfere with each other and could actually hurt fairness.
For example, if a VMDK was used by a database, and this database issued a lot of I/O, this could compete with I/Os from the boot-disk. This in turn could make it appear that the VM (Guest OS) is running slowly.
This week I had the opportunity to roll-out the HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.
Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.
The more observant of you may have observed the following entry in the VSAN 6.1 Release Notes: Virtual SAN monitors solid state drive and magnetic disk drive health and proactively isolates unhealthy devices by unmounting them. It detects gradual failure of a Virtual SAN disk and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster. An alarm is generated from each host whenever an unhealthy device is detected and an event is generated if an unhealthy device is automatically unmounted. The purpose of this post is to provide you with a little bit more information around this cool new feature.
Today sees the release of the vRealize Operations Management Pack for Storage Devices (MPSD) version 6.0.2. This is exciting for me as it means that vROps now has management and monitoring features for Virtual SAN 6.0. The management pack comes with a set of default dashboards for Virtual SAN clusters, as well as the ability to monitor and create proactive alerts/notifications based on VSAN events.
I took the vROps Management Pack for a spin a little while back, and used it on my own lab cluster. Included below are a few of the features that it has.
I took the opportunity last week (while I was over in the Boston area) to catch up with Scott Davis. I’ve known Scott a long time, as he had various roles at VMware over a number of years. Scott is currently CTO at Infinio, a company that has developed an I/O acceleration product for virtual machines. The new version of Infinio Accelerator 2.0 released only a few weeks back, so I decided to reach out to Scott and find out about the enhancements that went into this new version.
Virtual SAN already has a number of features and extensions for performance monitoring and real-time diagnostics and troubleshooting. In particular, there is VSAN Observer, which is included as part of the Ruby vSphere Console (RVC). Another new feature is the Health Check Plugin, which was recently launched for VSAN 6.0. However, a lot of our VSAN customers are already using vRealize Operations Manager, and they have asked if this could be extended to VSAN, allowing them us to use a “single pane of glass” for their infrastructure monitoring. That’s just what we have done, and the beta for the vROps Management Pack for Virtual SAN is now open. You can sign up by clicking here.
With the release of VSAN 6.0, and the new all-flash configuration (AF-VSAN), I have received a number of queries around our 10% cache recommendation. The main query is, since AF-VSAN no longer requires a read cache, can we get away with a smaller write cache/buffer size?
Before getting into the cache sizing, it is probably worth beginning this post with an explanation about the caching algorithm changes between version 5.5 and 6.0. In VSAN 5.5, which came as a hybrid configuration only with a mixture of flash and spinning disk, cache behaved as both a write buffer (30%) and read cache (70%). If a read request was not satisfied by the cache, in other words there was a read cache miss, then the data block was retrieved from the capacity layer. This was an expensive operation, especially in terms of latency, so the guideline was to keep your working set in cache as much as possible. Since the majority of virtualized applications have a working set somewhere in the region of 10%, this was where the cache size recommendation of 10% came from. With hybrid, there is regular destaging of data blocks from write cache to spinning disk. This is a proximal algorithm, which looks to destage data blocks that are contiguous (adjacent to one another). This speeds up the destaging operations.