Common VSAN health check issues and resolutions

health-checkA number of customers have experienced some issues with getting the Virtual SAN (VSAN) health check to work correctly in their environments. The most common issues have been permissions and certificates. In this post, I want to highlight these issues and any associated KB articles, and call out the symptom as well as the resolution.

KB 2117769 – incorrect permissions results in blank panels in vSphere client

When installing the health check on the windows version of vCenter Server, it is important that the user that is doing the installation has administrator privileges. The KB talks about one resolution. Another option is to open the command prompt window where you start the installation with the option “Run as administrator”. If you hit this issue, simply remove the health check, and reinstall it as a user with the correct permissions. This is documented in the latest version of the Health Check Guide.

KB 2133384 – health check fails to load with “Unexpected status code 400”

This is most commonly related to certificates, and the permissions associated with the certificate files. This permission issue occurs on the appliance version of vCenter only. Since VSAN health services is run by a non-privileged user, it may not be able to read the cert files, and thus will not be able to connect to vCenter. A colleague reported seeing this with the PSC (Platform Services Controller), when he changed its role to a Subordinate Authority Server. This KB takes you though the steps to rectify the situation.

health check fails to load with “Unexpected status code 503”

This was reported in the communities, so it does not have a KB article associated with it. After troubleshooting this issue, the customer noticed that they had a typo in a reverse DNS entry for the VCSA appliance. Once they fixed this, and redeployed with the correct DNS, the problem was solved and health check worked correctly. 503 is service unavailable.

11 Replies to “Common VSAN health check issues and resolutions”

  1. How about the HCL DB always being incorrect? Twice in the last 3-4 months I’ve gotten warnings to check that drivers I’m using are on the HCL when the versions are just IDENTICAL. Example if I’m using driver version 1.2.3.a.b.c and VMware has the driver listed as 1.2.3.a.b.c.d. Quite frustrating to see that silly warning every day.

    1. Completely agree with you 100%. We need to get better at this. Its a known issue internally that we’re trying to resolve asap.

  2. Hi,
    Thanks for a suberb infosite!!

    Still having the “Unexpected status code 503” on a Vc 6 upgraded to U1. Have tried KB 2117769 and in my opinon the dns is also correctly configured. The plugin was working perfect before upgrading to U1 so something went wrong? Hopefully an reinstall of vcenter should not be necessary, or?

    1. Sorry – I’ve not seen this myself. I’ve only seen it reported in the communities. I would suggest having a chat with our GSS folks to see if they can help.

  3. Another issue I am having, is trying to ‘Enable’ the Health Service. The current “Health Service Status” is “Unknown (Issues connecting to EAM. Try restarting it)”. When you try to Enable it, you get an error in Preflight-Check error “Cannot Enable the Health Service”. ESX Version compatibility check passes, as do Fully automated DRS Check. but “EAM connectivity Check” fails. I have a ticket open with GSS, but we seem to be going around in circles…

  4. The VMware support is no help. Our case was closed because our mainboard was only certified by Intel for VMware 5.5, but not 6.0. Of course above problem is no hardware issue at all, but it seems it is much easier for VMware to drop all support cases.

    Anyway, we found the solution ourselfes: check /var/log/vmware/eam/eam.log for failed user authentications. If that is the case most likely the certificate is broken. Repair by issuing the following commands:

    mkdir /certificate
    /usr/lib/vmware-vmafd/bin/vecs-cli entry getcert –store vpxd-extension –alias vpxd-extension –output /certificate/vpxd-extension.crt
    /usr/lib/vmware-vmafd/bin/vecs-cli entry getkey –store vpxd-extension –alias vpxd-extension –output /certificate/vpxd-extension.key
    python /usr/lib/vmware-vpx/scripts/updateExtensionCertInVC.py -e com.vmware.vim.eam -c /certificate/vpxd-extension.crt -k /certificate/vpxd-extension.key -s localhost -u administrator@…..
    service-control –stop vmware-eam
    service-control –start vmware-eam

    (just make sure to insert the correct SSO-username in the python command above)

  5. No go for me: Here’s the snipped from the eam.log that keeps repeating every 10 seconds:

    **************************************************************************************

    SERVERNAME:/var/log/vmware/eam # tail -f eam.log
    at com.vmware.eam.vc.VcListener.call(VcListener.java:60)
    at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:35)
    at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:52)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

    2015-11-23T10:50:12.332-06:00 | INFO | eam-0 | VcListener.java | 121 | Retrying in 10 sec.
    2015-11-23T10:50:22.332-06:00 | INFO | eam-0 | VcConnection.java | 167 | Connecting to vCenter as com.vmware.vim.eam extension
    2015-11-23T10:50:22.363-06:00 | INFO | eam-0 | VcConnection.java | 603 | Connecting to https://SERVERNAME.DOMAIN.COM:8089/sdk/vimService via vCenter proxy http://localhost:80
    2015-11-23T10:50:22.407-06:00 | INFO | eam-0 | VcConnection.java | 174 | Logged in with logical user session ID 439CB441
    2015-11-23T10:50:22.407-06:00 | INFO | eam-0 | VcConnection.java | 176 | Logged in with physical session cookie F9B682B7
    2015-11-23T10:50:22.407-06:00 | INFO | eam-0 | VcListener.java | 150 | Connected to vCenter server
    2015-11-23T10:50:22.407-06:00 | WARN | eam-0 | VcListener.java | 291 | Adding same observer ClientAuthenticator listening for changes to ManagedObjectReference: type = SessionManager, value = SessionManager, serverGuid = 39D2349F-02D3-4015-B5BC-4B11F9786D77 twice!
    2015-11-23T10:50:22.408-06:00 | WARN | eam-0 | VcListener.java | 291 | Adding same observer EsxAgentManager listening for changes to ManagedObjectReference: type = ExtensionManager, value = ExtensionManager, serverGuid = 39D2349F-02D3-4015-B5BC-4B11F9786D77 twice!
    2015-11-23T10:50:22.418-06:00 | INFO | eam-0 | VcKeyValueStore.java | 68 | Loaded 7 values from VC database.
    2015-11-23T10:50:22.418-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: a629b582-9971-4b9a-b5b2-113aa140ab60::EsxAgentManager:EsxAgentManager:agency[0]=a629b582-9971-4b9a-b5b2-113aa140ab60::Agency:agency-0
    2015-11-23T10:50:22.418-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:serverGuid=a629b582-9971-4b9a-b5b2-113aa140ab60
    2015-11-23T10:50:22.418-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:name.ref.Guest Introspection=1
    2015-11-23T10:50:22.418-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:name.ref.VMware Network Fabric=1
    2015-11-23T10:50:22.419-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:moRefCount=5
    2015-11-23T10:50:22.419-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:name.count.VMware Network Fabric=2
    2015-11-23T10:50:22.419-06:00 | DEBUG | eam-0 | PartitionedMapStore.java | 139 | Loading key-value: global:name.count.Guest Introspection=2
    2015-11-23T10:50:22.419-06:00 | INFO | eam-0 | AgencyImpl.java | 1772 | Loading database for agency: ManagedObjectReference: type = Agency, value = agency-0, serverGuid = a629b582-9971-4b9a-b5b2-113aa140ab60
    2015-11-23T10:50:22.419-06:00 | DEBUG | eam-0 | AgencyImpl.java | 2367 | Reading an agency created in a previous release.
    2015-11-23T10:50:22.424-06:00 | WARN | eam-0 | HttpConfigurationCompilerBase.java | 95 | Shutting down the connection monitor.
    2015-11-23T10:50:22.424-06:00 | ERROR | eam-0 | VcListener.java | 116 | An unexpected error in the changes polling loop
    java.lang.RuntimeException: Config not saved
    at com.vmware.eam.AgencyImpl.loadConfiguration(AgencyImpl.java:1832)
    at com.vmware.eam.AgencyImpl.loadFromDatabase(AgencyImpl.java:1780)
    at com.vmware.eam.AgencyImpl.(AgencyImpl.java:407)
    at com.vmware.eam.EsxAgentManagerImpl.loadFromDatabase(EsxAgentManagerImpl.java:671)
    at com.vmware.eam.EsxAgentManagerImpl.runPostConfiguration(EsxAgentManagerImpl.java:299)
    at com.vmware.eam.EsxAgentManagerImpl.vCenterConnectionStatusChanged(EsxAgentManagerImpl.java:609)
    at com.vmware.eam.vc.VcListener.main(VcListener.java:135)
    at com.vmware.eam.vc.VcListener.call(VcListener.java:111)
    at com.vmware.eam.vc.VcListener.call(VcListener.java:60)
    at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:35)
    at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:52)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
    2015-11-23T10:50:22.425-06:00 | INFO | eam-0 | VcListener.java | 117 | Full stack trace: java.lang.RuntimeException: Config not saved
    at com.vmware.eam.AgencyImpl.loadConfiguration(AgencyImpl.java:1832)
    at com.vmware.eam.AgencyImpl.loadFromDatabase(AgencyImpl.java:1780)
    at com.vmware.eam.AgencyImpl.(AgencyImpl.java:407)
    at com.vmware.eam.EsxAgentManagerImpl.loadFromDatabase(EsxAgentManagerImpl.java:671)
    at com.vmware.eam.EsxAgentManagerImpl.runPostConfiguration(EsxAgentManagerImpl.java:299)
    at com.vmware.eam.EsxAgentManagerImpl.vCenterConnectionStatusChanged(EsxAgentManagerImpl.java:609)
    at com.vmware.eam.vc.VcListener.main(VcListener.java:135)
    at com.vmware.eam.vc.VcListener.call(VcListener.java:111)
    at com.vmware.eam.vc.VcListener.call(VcListener.java:60)
    at com.vmware.eam.async.impl.AuditedJob.call(AuditedJob.java:35)
    at com.vmware.eam.async.impl.FutureRunnable.run(FutureRunnable.java:52)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

    2015-11-23T10:50:22.425-06:00 | INFO | eam-0 | VcListener.java | 121 | Retrying in 10 sec.

Comments are closed.