Primary Data were one of the storage vendors that I wanted to catch up with at VMworld 2015. I was fortunate enough to meet with Graham Smith who is their Director of Virtualization Product Management. Graham gave me a demonstration of the Primary Data product in the Solutions Exchange at VMworld, and I also had an opportunity to visit their offices in Los Altos during a recent trip to the bay area and catch up once again with Graham and Kaycee Lai, SVP of Product Management & Sales at Primary Data. Before we get into the product and solution details, I wanted to go over a brief history of the company and the problem that they are trying to solve with their DataSphere Platform.
Many regular readers will know that we do not do read locality in Virtual SAN. For VSAN, it has always been a trade-off of networking vs. storage latency. Let me give you an example. When we deploy a virtual machine with multiple objects (e.g. VMDK), and this VMDK is mirrored across two disks on two different hosts, we read in a round-robin fashion from both copies based on the block offset. Similarly, as the number of failures to tolerate is increased, resulting in additional mirror copies, we continue to read in a round-robin fashion from each copy, again based on block offset. In fact, we don’t even need to have the VM’s compute reside on the same host as a copy of the data. In other words, the compute could be on host 1, the first copy of the data could be on host 2 and the second copy of the data could be on host 3. Yes, I/O will have to do a single network hop, but when compared to latency in the I/O stack itself, this is negligible. The cache associated with each copy of the data is also warmed, as reads are requested. The added benefit of this approach is that vMotion operations between any of the hosts in the VSAN cluster do not impact the performance of the VM – we can migrate the VM to our hearts content and still get the same performance.
So that’s how things were up until the VSAN 6.1 release. There is now a new network latency element which changes the equation when we talk about VSAN stretched clusters. The reasons for this change will become obvious shortly.
This week I had the opportunity to roll-out the HCIbench tool on one of my all-flash VSAN clusters (much kudos to my friends over at Micron for the loan of a bunch of flash devices for our lab). The HCIbench is a tool developed internally at VMware to make the deployment of a benchmark tool for hyper-converged infrastructure (HCI) systems quite simple. In particular, we wanted something that customers could use on Virtual SAN (VSAN). It’s an excellent tool for those of you looking to do a performance test on hyper-converged infrastructures, thus the name HCIbench.
Please note that this blog post is not about discussing the results, as these will vary from environment to environment due to the open nature of VSAN’s HCL. This blog is more of a primer to assist the reader in getting started with HCIbench.
Another of the break-out sessions that I presented at VMworld 2015 in San Francisco on Virtual SAN (VSAN) has been recorded and is now available on the VMworld site. I co-presented “STO6228 Monitoring and Troubleshooting Virtual SAN, Current and Future” with Christian Dickmann of VMware, who did the latter part of the session. I do the initial introduction, talking briefly about VSAN, and then the various tools that we now have for monitoring and troubleshooting. Christian then takes the stage to talk about how things have progressed over the past year, certain use cases and some future plans we have in this area of monitoring and troubleshooting.
The more observant of you may have observed the following entry in the VSAN 6.1 Release Notes: Virtual SAN monitors solid state drive and magnetic disk drive health and proactively isolates unhealthy devices by unmounting them. It detects gradual failure of a Virtual SAN disk and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster. An alarm is generated from each host whenever an unhealthy device is detected and an event is generated if an unhealthy device is automatically unmounted. The purpose of this post is to provide you with a little bit more information around this cool new feature.