This is something I only learnt about very recently, and something I was unaware of. It seems that we have made a major improvement to the way we do snapshot consolidation in vSphere 6.0. Many of you will be aware of the fact that when they VM is very busy, snapshot consolidation may need to go through multiple iterations before we can successfully complete the consolidation/roll-up operation. In fact, there are situations where the snapshot consolidation operation could even fail if there is too much I/O.
What we did previously is used a helper snapshot, and redirected all the new I/Os to this helper snapshot while we consolidated the original chain. Once the original chain is consolidated, we then did a calculation to see how long it would take to consolidate the helper snapshot. It could be that this helper snapshot has grown considerably during the consolidate operation. If the time to consolidate the helper is within a certain time-frame (12 seconds), we stunned the VM and consolidated the helper snapshot into the base disk. If it was outside the acceptable time-frame, then we repeated the process (new helper snapshot while we consolidated original helper snapshot) until the helper could be committed to the base disk within the acceptable time-frame.
This is a new feature in vSphere 6.0 that I only recently became aware of. Prior to vSphere 6.0, all the I/Os from a given virtual machine to a particular device would share a single I/O queue. This would result in all the I/Os from the VM (boot VMDK, data VMDK, snapshot delta) queued into a single per-VM, per-device queue. This caused I/Os from different VMDKs interfere with each other and could actually hurt fairness.
For example, if a VMDK was used by a database, and this database issued a lot of I/O, this could compete with I/Os from the boot-disk. This in turn could make it appear that the VM (Guest OS) is running slowly.
This week I had the opportunity to roll-out the HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.
Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.
The more observant of you may have observed the following entry in the VSAN 6.1 Release Notes: Virtual SAN monitors solid state drive and magnetic disk drive health and proactively isolates unhealthy devices by unmounting them. It detects gradual failure of a Virtual SAN disk and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster. An alarm is generated from each host whenever an unhealthy device is detected and an event is generated if an unhealthy device is automatically unmounted. The purpose of this post is to provide you with a little bit more information around this cool new feature.
Today sees the release of the vRealize Operations Management Pack for Storage Devices (MPSD) version 6.0.2. This is exciting for me as it means that vROps now has management and monitoring features for Virtual SAN 6.0. The management pack comes with a set of default dashboards for Virtual SAN clusters, as well as the ability to monitor and create proactive alerts/notifications based on VSAN events.
I took the vROps Management Pack for a spin a little while back, and used it on my own lab cluster. Included below are a few of the features that it has.
I took the opportunity last week (while I was over in the Boston area) to catch up with Scott Davis. I’ve known Scott a long time, as he had various roles at VMware over a number of years. Scott is currently CTO at Infinio, a company that has developed an I/O acceleration product for virtual machines. The new version of Infinio Accelerator 2.0 released only a few weeks back, so I decided to reach out to Scott and find out about the enhancements that went into this new version.
Virtual SAN already has a number of features and extensions for performance monitoring and real-time diagnostics and troubleshooting. In particular, there is VSAN Observer, which is included as part of the Ruby vSphere Console (RVC). Another new feature is the Health Check Plugin, which was recently launched for VSAN 6.0. However, a lot of our VSAN customers are already using vRealize Operations Manager, and they have asked if this could be extended to VSAN, allowing them us to use a “single pane of glass” for their infrastructure monitoring. That’s just what we have done, and the beta for the vROps Management Pack for Virtual SAN is now open. You can sign up by clicking here.