Announcing the Virtual SAN 6.0 Health Check Plugin

health-checkToday VMware announces the Virtual SAN 6.0 Health Check Plugin, a feature that will check your Virtual SAN configuration, both proactively and re-actively, and highlight any abnormal conditions found in the cluster. This is available to all our VSAN customers right now. Not only does it check the health of the cluster, but it also checks the state of the network, host connectivity, physical disk status, and underlying virtual machine object state. This is a great tool for ensuring that an initial deployment of VSAN or proof-of-concept has been rolled out successful, giving you confidence in your VSAN deployment. It is also useful for ongoing monitoring and maintenance of your Virtual SAN cluster.

Simple Installation without any downtime

The feature has two components; a plugin to vCenter and a VIB for the ESXi hosts. It works on both the Windows and appliance versions of vCenter Server. VMware provides an MSI and RPM for each version of vCenter. Once installed, there is a new health view for Virtual SAN in the web client. Initially the health service is disabled, but can be enabled with the click of a button on the UI. Once the health service is enabled via the web client, a VIB is pushed out to all of the ESXi hosts in the cluster (using a rolling upgrade type mechanism). This means that the installation can be done without any downtime.

install-view When all hosts have the VIB installed, (which does involve the host entering maintenance mode and rebooting), the health checks can now be run against the cluster. This installation process can also be done via RVC, the Ruby vSphere Console. A new health section is now available under the Monitor tab > Virtual SAN view, along with the 30 or so individual health checks:

vsan 6 health check listPeace of mind on HCL (Hardware Compatibility Guide) issues

One of the most difficult issues for VSAN customers is ensuring that their hardware configuration is supported. This entailed some tedious time spent checking the VMware Compatibility Guide. With the new VSAN health check plugin, the underlying hardware of the VSAN hosts, storage controllers and even driver versions can automatically be checked against the HCL. If the vCenter server has access to the Internet, a HCL database file may be retrieved automatically from VMware. Alternatively, a HCL database file can be downloaded from VMware.com, transferred to the vCenter server, and uploaded manually. This gives peace of mind that the underlying configuration is indeed supported for VSAN at the click of a button.

Easy troubleshooting via Ask VMware

Every individual health check is mapped to a KB article via “Ask VMware”. Therefore if any of the tests or checks report a warning or error, the admin can simply click the “Ask VMware” button associated with the test and he/she will be taken straight to the VMware knowledge base site and an article describing the test, why it might have failed and what can be done to remedy the situation is provided. Here is an example of a check which failed because not all of the advanced settings used by VSAN were in-sync across all the hosts in the VSAN cluster. Details about the nature of the failure are provided, as well as values and hosts that mismatch. By clicking on the “Ask VMware” button, you will be directed to a KB article giving you more information about the failed health check:

how-to-troubleshoot-errorProactive testing – making sure everything is working optimally pre-production

Not only does the health check plugin highlight issues re-actively (after they have occurred), but it also contains a set of proactive tests. These should not be run in production, but are a way of checking the integrity of the cluster during pre-production. These tests will selectively check the successful deployment of virtual machines on the VSAN datastore. It can also check the bandwidth of the multicast network between the various hosts in the cluster, verifying that it is acceptable for VSAN traffic. And probably most useful of all is the inclusion of a storage performance test, offering admin a range of different workloads to work with. The duration of a particular test may also be chosen before it is run, offering a burn-in test of sorts for VSAN’s hardware components. Here is an example of one such test, and its results.

storage-perf-tests-resultsQuicker Support experience via Support Assistant

If you still need to engage our support staff for an issue, the VSAN health check plugin now has an integrated support assistant. Once you have your Service Request (SR) opened, logs can be uploaded through the Support Assistant on the main VSAN UI page and associated directly to your SR. A real time-saver.

Optional Customer Experience Improvement Program (CEIP)

The health check plugin also allows you to participate in VMware’s “Customer Experience Improvement Program” and provides information about how you use your environment, etc. This is completely optional, and can be enabled or disabled at any time via the main VSAN UI page. The UI provides details on where further information about CEIP can be found.

Sounds great. Where do I begin?

Start with downloading the VSAN 6.0 Health Services Plugin guide. This has all the details you need about where to get the software, how to install it, and details on all of the individual checks that the health plugin carries out. It also provides details about all the other features that are not covered in this blog post. What are you waiting for? Go get it now!

15 Replies to “Announcing the Virtual SAN 6.0 Health Check Plugin”

  1. Is any of the information that is used in the plugin’s reporting capability available via the MOB for vCenter?

    1. My understanding is that some of the health check is available though the MOB, but not all of it.

      Since all of the checks are available via RVC, you can probably figure out which bits are available and which are not.

  2. Cormac , Is there a way to create additional custom storage performance tests ?

  3. Awesome tool! Loving it already. However, does not appear to do controller firmware version checks against HCL. <– Feature request ?

  4. Installing it seems to break the web client. I can login and then browse my environment, but the middle part of the web client is just blank. I tried IE, Chrome, and Firefox. If I uninstall it, the web client works as expected. Has anyone else seen this? Thank you, Zach.

      1. There is definitely something wrong around the web client when the plugin is installed at least on a Windows based vCenter. Can’t confirm for the appliance version as I didn’t try the plugin over it.

        In any case it is not only the middle part of the screen being blanked out but also most of the right-click menu options on the left are missing. When I click the menu over the vCenter instance I can see only two options available.

        I haven’t seen any installation prerequisite requirements throughout the PDF guide for the plugin deployment process, yet I wonder if there might be something missing anyway… Or perhaps we might be facing some sort of a permission related issue during the integration process.

    1. I have the same issue as Zach. Filed an SR last week but we haven’t found a solution yet.

      1. Kicking off the MSI through a cmd prompt run as administrator did the trick.
        Thanks Cormac

  5. This should be part vSphere IMHO.
    RVC, vSAN observer and now a health check that justifiably should just be part of the platform.
    Are the vSAN engineers free to chose whatever tool they prefer?

    The VMware portfolio is fragmented as is, please do not balkanize vSphere too.

    1. Couldn’t agree more. RVC and vSAN Observer just are so odd. Why would they develop those tools that way, let alone release a product with them being primary interaction points? It is so disjointed.

    2. The plugin becomes part of vSphere when you install it? The reason it is released as a plugin is to allow for faster iterations / updates and more flexibility during the first 12 months. It will be included in the next release is my understanding.

Comments are closed.