I’ve been involved in a few conversations recently regarding how VSAN trace files are handled when the ESXi host that is participating in a VSAN cluster boots from a flash device. I already did a post about some of these considerations in the past, but focused mostly on USB/SD. However SATADOM was not included in this discussion, as we did not initially support SATADOM in VSAN 5.5, and only announced SATADOM support for VSAN 6.0. It seems that there are some different behaviors that need to be taken into account between the various flash boot devices, which is why I decided to write this post.
Virtual SAN already has a number of features and extensions for performance monitoring and real-time diagnostics and troubleshooting. In particular, there is VSAN Observer, which is included as part of the Ruby vSphere Console (RVC). Another new feature is the Health Check Plugin, which was recently launched for VSAN 6.0. However, a lot of our VSAN customers are already using vRealize Operations Manager, and they have asked if this could be extended to VSAN, allowing them us to use a “single pane of glass” for their infrastructure monitoring. That’s just what we have done, and the beta for the vROps Management Pack for Virtual SAN is now open. You can sign up by clicking here.
I had a query recently from a partner who was deploying VMware Horizon View 6.1 on top of an all-flash VSAN 6.0. They had done all the due diligence with configuring the AF-VSAN appropriately, marking certain flash devices as capacity devices, and so on. The configuration looked something like this:
The they went ahead and deployed Horizon View 6.1, which they had done many times before on hybrid configurations. They were able to successfully deploy full clone pools on the AF-VSAN, but hit a strange issue when deploying linked clone pools (floating/dedicated). The clone virtual machine operation would fail with an “Insufficient disk space on datastore” error, similar to the following:
With the release of VSAN 6.0, and the new all-flash configuration (AF-VSAN), I have received a number of queries around our 10% cache recommendation. The main query is, since AF-VSAN no longer requires a read cache, can we get away with a smaller write cache/buffer size?
Before getting into the cache sizing, it is probably worth beginning this post with an explanation about the caching algorithm changes between version 5.5 and 6.0. In VSAN 5.5, which came as a hybrid configuration only with a mixture of flash and spinning disk, cache behaved as both a write buffer (30%) and read cache (70%). If a read request was not satisfied by the cache, in other words there was a read cache miss, then the data block was retrieved from the capacity layer. This was an expensive operation, especially in terms of latency, so the guideline was to keep your working set in cache as much as possible. Since the majority of virtualized applications have a working set somewhere in the region of 10%, this was where the cache size recommendation of 10% came from. With hybrid, there is regular destaging of data blocks from write cache to spinning disk. This is a proximal algorithm, which looks to destage data blocks that are contiguous (adjacent to one another). This speeds up the destaging operations.
A few weeks, my good pal Cody Hosterman over at Pure Storage was experimenting with VAAI and discovered that he could successfully UNMAP blocks (reclaim) directly from a Guest OS in vSphere 6.0. VAAI are the vSphere APIs for Array Integration. Cody wrote about his findings here. Effectively, if you have deleted files within a Guest OS, and your VM is thinly provisioned, you can tell the array through this VAAI primitive that you are no longer using these blocks. This allows the array to reclaim them for other uses. I know a lot of you have been waiting for this functionality for some time. However Cody had a bunch of questions and reached out to me to see if I could provide some answers. After conversing with a number of engineers and product managers here at VMware, here are some of the answers to the questions that Cody asked.
In Virtual SAN version 6.0, VMware introduced support for an all-flash VSAN. In other words, both the caching layer and the capacity layer could be made up of flash-based devices such as SSDs. However, the mechanism for marking some flash devices as being designated for the capacity layer, while leaving other flash devices as designated for the caching layer, is not at all intuitive at first glance. For that reason, I’ve included some steps here on how to do it.
There are a couple of key concepts to understanding Virtual Volumes (or VVols for short). VVols is one of the key new storage features in vSphere 6.0. You can get an overview of VVols from this post. The first key concept is VASA – vSphere APIs for Storage Awareness. I wrote about the initial release of VASA way back in the vSphere 5.0 launch. VASA has changed significantly to support VVols, with the introduction of version 2.0 in vSphere 6.0, but that is a topic for another day. Another key feature is the concept of a Protocol Endpoint, a logical I/O proxy presented to a host to communicate with Virtual Volumes. My good pal Duncan writes about some considerations with PEs and queue depths here. This again is a topic for a deeper conversation, but not today. Today, I want to talk about a third major concept, a Storage Container.