This week I was in Berlin for our annual Tech Summit in EMEA. This is an event for our field folks in EMEA. I presented a number of VSAN sessions, including a design and sizing session. As part of that session, the topic of VSAN memory consumption was raised. In the past, we’ve only ever really talked about the host memory requirements for disk group configuration as highlighted in this post here. For example, as per the post, to a run a fully configured Virtual SAN system, with 5 fully populated disk groups per host, and 7 disks in each disk group, a minimum of 32GB of host memory is needed. This is not memory consumed by VSAN by the way. This memory may also be used to run workloads. Consider it as a configuration limit if you will. As per the post above, if hosts have less than 32GB of memory, then we scale back on the number of disk groups that can be created on the host.
To the best of my knowledge, we never shared information about what contributes to memory consumption on VSAN clusters. That is what I plan to talk about in this post.
Many regular readers will know that we do not do read locality in Virtual SAN. For VSAN, it has always been a trade-off of networking vs. storage latency. Let me give you an example. When we deploy a virtual machine with multiple objects (e.g. VMDK), and this VMDK is mirrored across two disks on two different hosts, we read in a round-robin fashion from both copies based on the block offset. Similarly, as the number of failures to tolerate is increased, resulting in additional mirror copies, we continue to read in a round-robin fashion from each copy, again based on block offset. In fact, we don’t even need to have the VM’s compute reside on the same host as a copy of the data. In other words, the compute could be on host 1, the first copy of the data could be on host 2 and the second copy of the data could be on host 3. Yes, I/O will have to do a single network hop, but when compared to latency in the I/O stack itself, this is negligible. The cache associated with each copy of the data is also warmed, as reads are requested. The added benefit of this approach is that vMotion operations between any of the hosts in the VSAN cluster do not impact the performance of the VM – we can migrate the VM to our hearts content and still get the same performance.
So that’s how things were up until the VSAN 6.1 release. There is now a new network latency element which changes the equation when we talk about VSAN stretched clusters. The reasons for this change will become obvious shortly.
The more observant of you may have observed the following entry in the VSAN 6.1 Release Notes: Virtual SAN monitors solid state drive and magnetic disk drive health and proactively isolates unhealthy devices by unmounting them. It detects gradual failure of a Virtual SAN disk and isolates the device before congestion builds up within the affected host and the entire Virtual SAN cluster. An alarm is generated from each host whenever an unhealthy device is detected and an event is generated if an unhealthy device is automatically unmounted. The purpose of this post is to provide you with a little bit more information around this cool new feature.
Datrium are a new storage company who only recently came out of stealth. They are one of the companies that I really wanted to catch up with at VMworld 2015. They have a lot of well-respected individuals on their team, including Boris Weissman, who was a principal engineer at VMware and Brian Biles of Data Domain fame. They also count of Diane Green, founder of VMware, among their investors. So there is a significant track record in both storage and virtualization at the company.
With the announcements just made at VMworld 2015, the embargo on Virtual SAN 6.1 has now been lifted, so we can now discuss publicly some of the new features and functionality. Virtual SAN is VMware’s software-defined solution for Hyper-Converged Infrastructure (HCI). For the last number of months, I’ve been heavily involved in preparing for the Virtual SAN 6.1 launch. What follows is a brief description of what I find to be the most interesting and exciting of the upcoming features in Virtual SAN 6.1. Later on, I will follow-up with more in-depth blog posts on the new features and functionality.
I’ve been involved in a few conversations recently regarding how VSAN trace files are handled when the ESXi host that is participating in a VSAN cluster boots from a flash device. I already did a post about some of these considerations in the past, but focused mostly on USB/SD. However SATADOM was not included in this discussion, as we did not initially support SATADOM in VSAN 5.5, and only announced SATADOM support for VSAN 6.0. It seems that there are some different behaviors that need to be taken into account between the various flash boot devices, which is why I decided to write this post.
Virtual SAN already has a number of features and extensions for performance monitoring and real-time diagnostics and troubleshooting. In particular, there is VSAN Observer, which is included as part of the Ruby vSphere Console (RVC). Another new feature is the Health Check Plugin, which was recently launched for VSAN 6.0. However, a lot of our VSAN customers are already using vRealize Operations Manager, and they have asked if this could be extended to VSAN, allowing them us to use a “single pane of glass” for their infrastructure monitoring. That’s just what we have done, and the beta for the vROps Management Pack for Virtual SAN is now open. You can sign up by clicking here.