Component Metadata Health – Locating Problematic Disk

I’ve noticed a couple of customers experiencing a Component Metadata Health failure on the VSAN health check recently. This is typically what it looks like:

component-metadata-healthThe first thing to note is that the KB associated with this health check states the following:

Note: This health check test can fail intermittently if the destaging process is slow, most likely because VSAN needs to do physical block allocations on the storage devices. To work around this issue, run the health check once more after the period of high activity (multiple virtual machine deployments, etc) is complete. If the health check continues to fail the warning is valid. If the health check passes, the warning can be ignored.

Continue reading

Essential Virtual SAN (6.2) available for pre-order

book-edition2Our friends over at Pearson and VMware Press have informed us that the second edition of the Essential Virtual SAN book (that I wrote with Duncan Epping) is now available for pre-order on Amazon. It looks like it will be available on June 13th, but VMware Press have told us that they will do what they can to pull the date in a little closer. This new edition covers all of the new features added to Virtual SAN, up to the latest (yet to be released) VSAN 6.2. Here’s some blurb on the new edition, which gives a little insight into the new content:

Fully updated for the newest versions of VMware Virtual SAN, this guide show how to scale VMware’s fully distributed storage architecture to meet any enterprise storage requirement. World-class Virtual SAN experts Cormac Hogan and Duncan Epping thoroughly explain how Virtual SAN integrates into vSphere 6.x and enables the Software Defined Data Center (SDDC). You’ll learn how to take full advantage of Virtual SAN, and get up-to-the-minute insider guidance for architecture, implementation, and management.

If you want to order it at a local book store, here are the ISBN details:

  • ISBN-13: 978-0134511665
  • ISBN-10: 0134511662

Hope you find it useful. And thanks to my co-author Duncan, a consummate professional. It has been great working with you once again on this new edition of the book.

VSAN 6.2 Part 10 – Problematic Disk Handling

disk-failureIn this post, I want to talk about a feature called Problematic Disk Handling. Some history behind why we have such a feature can be found in this post. In VSAN 6.2/vSphere 6.0 U2, Problematic Disk Handling has been improved so that it will unmount a problematic disk/diskgroup for two reasons:

Continue reading

Datrium go GA

datriumThis week Datrium announced that their DVX system is now generally available. I met these guys at VMworld 2015, and wrote a closer look at Datrium here. If you want a deeper dive into their solution, please read that post. But in a nutshell, their solution uses a combination of host side flash devices to accelerate read I/O, while at the same time writing to the Datrium hardware storage appliance (called a NetShelf). The NetShelf provides “cheap, durable storage that is easy to manage”. The DVX architecture presents the combined local cache/flash devices and NetShelf as a single shared NFS v3 datastore to your ESXi hosts.

Continue reading

VSAN.ClomMaxComponentSizeGB explained

In the VSAN Troubleshooting Reference Manual, the following description of VSAN.ClomMaxComponentSizeGB is provided:

By default VSAN.ClomMaxComponentSizeGB is set to 255GB. When Virtual SAN stores virtual machine objects, it creates components whose default size does not exceed 255 GB. If you use physical disks that are smaller than 255GB, then you might see errors similar to the following when you try to deploy a virtual machine:

There is no more space for virtual disk XX. You might be able to continue this session by freeing disk space on the relevant volume and clicking retry.

Continue reading

VSAN Design & Sizing – Memory overhead considerations

This week I was in Berlin for our annual Tech Summit in EMEA. This is an event for our field folks in EMEA. I presented a number of VSAN sessions, including a design and sizing session. As part of that session, the topic of VSAN memory consumption was raised. In the past, we’ve only ever really talked about the host memory requirements for disk group configuration as highlighted in this post here. For example, as per the post, to a run a fully configured Virtual SAN system, with 5 fully populated disk groups per host, and 7 disks in each disk group, a minimum of 32GB of host memory is needed. This is not memory consumed by VSAN by the way. This memory may also be used to run workloads. Consider it as a configuration limit if you will. As per the post above, if hosts have less than 32GB of memory, then we scale back on the number of disk groups that can be created on the host.

To the best of my knowledge, we never shared information about what contributes to memory consumption on VSAN clusters. That is what I plan to talk about in this post.

Continue reading

Read locality in VSAN stretched cluster

Many regular readers will know that we do not do read locality in Virtual SAN. For VSAN, it has always been a trade-off of networking vs. storage latency. Let me give you an example. When we deploy a virtual machine with multiple objects (e.g. VMDK), and this VMDK is mirrored across two disks on two different hosts, we read in a round-robin fashion from both copies based on the block offset. Similarly, as the number of failures to tolerate is increased, resulting in additional mirror copies, we continue to read in a round-robin fashion from each copy, again based on block offset. In fact, we don’t even need to have the VM’s compute reside on the same host as a copy of the data. In other words, the compute could be on host 1, the first copy of the data could be on host 2 and the second copy of the data could be on host 3. Yes, I/O will have to do a single network hop, but when compared to latency in the I/O stack itself, this is negligible. The cache associated with each copy of the data is also warmed, as reads are requested. The added benefit of this approach is that vMotion operations between any of the hosts in the VSAN cluster do not impact the performance of the VM – we can migrate the VM to our hearts content and still get the same performance.

round-robin-readsSo that’s how things were up until the VSAN 6.1 release. There is now a new network latency element which changes the equation when we talk about VSAN stretched clusters. The reasons for this change will become obvious shortly.

Continue reading