Hello from VMworld EMEA in Barcelona. Well, we can finally talk about vSphere 6.5 today. In this post, I want to highlight a number of new and enhanced features that you will find in vSphere 6.5 related to core storage. I am not going to discuss Virtual SAN (VSAN), Virtual Volumes (VVols) or I/O Filter enhancements (VAIO) specifically in this post, although you will no doubt see some new features tie directly into the latter. Instead, I want to talk about those features that are specific to core storage.
I got a bit of a surprise a few weeks back when I noticed a register article by Chris Mellor stating that PrimaryIO (previously CacheBox) had announced a new cache acceleration I/O filter for vSphere. We first announced plans for VAIO (vSphere APIs for I/O Filters) back at VMworld 2014. VAIO allows VMware partners to plug their products/features directly into the VM I/O Path which in turn will give our customers access to 3rd party storage services/features like deduplication, compression, replication or encryption which may not be available on their storage array. Or in this case, a cache acceleration feature. I wasn’t aware of any announcement internally at VMware, so reading it on the register came as a bit of a surprise. I know that other partners such as SanDisk and Infinio are also working on cache acceleration products. However this was the first time I heard of PrimaryIO developing a cache acceleration filter.
Many seasoned VSAN administrators will know how heavily we rely on VSAN Observer to get an understanding of the underlying performance of VSAN. While VSAN Observer is a very powerful tool, it does have some drawbacks. For one, it does not provide historic performance data, it simply gives a real-time view of the state of the system as it is currently, not what it was like previously. VSAN Observer is also a separate tool and is not integrated with vSphere web client, thus you didn’t have a “single pane of glass” view of the system. The tool is also complex, providing a lot of metrics that are engineering level metrics, and not really customer consumable. It also has an impact on vCenter Server, as the tool is launched via RVC, the Ruby vSphere Console, and RVC typically resides on the vCenter Server. With these limitations in mind, VSAN 6.2 introduces a new service to assist administrators in getting a detailed understanding of VSAN performance without the limitations outlined here.
This is something I only learnt about very recently, and something I was unaware of. It seems that we have made a major improvement to the way we do snapshot consolidation in vSphere 6.0. Many of you will be aware of the fact that when they VM is very busy, snapshot consolidation may need to go through multiple iterations before we can successfully complete the consolidation/roll-up operation. In fact, there are situations where the snapshot consolidation operation could even fail if there is too much I/O.
What we did previously is used a helper snapshot, and redirected all the new I/Os to this helper snapshot while we consolidated the original chain. Once the original chain is consolidated, we then did a calculation to see how long it would take to consolidate the helper snapshot. It could be that this helper snapshot has grown considerably during the consolidate operation. If the time to consolidate the helper is within a certain time-frame (12 seconds), we stunned the VM and consolidated the helper snapshot into the base disk. If it was outside the acceptable time-frame, then we repeated the process (new helper snapshot while we consolidated original helper snapshot) until the helper could be committed to the base disk within the acceptable time-frame.
This is a new feature in vSphere 6.0 that I only recently became aware of. Prior to vSphere 6.0, all the I/Os from a given virtual machine to a particular device would share a single I/O queue. This would result in all the I/Os from the VM (boot VMDK, data VMDK, snapshot delta) queued into a single per-VM, per-device queue. This caused I/Os from different VMDKs interfere with each other and could actually hurt fairness.
For example, if a VMDK was used by a database, and this database issued a lot of I/O, this could compete with I/Os from the boot-disk. This in turn could make it appear that the VM (Guest OS) is running slowly.
This week I had the opportunity to roll-out the HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.
Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.
The more observant of you may have observed the following entry in the VSAN 6.1 Release Notes: Virtual SAN monitors solid state drive and magnetic disk drive health and proactively isolates unhealthy devices by unmounting them. It detects gradual failure of a Virtual SAN disk and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster. An alarm is generated from each host whenever an unhealthy device is detected and an event is generated if an unhealthy device is automatically unmounted. The purpose of this post is to provide you with a little bit more information around this cool new feature.