VOMA – Found X actively heartbeating hosts on device

One of the long-awaited features introduced with vSphere 5.1 was VOMA (vSphere On-disk Metadata Analyzer). This is essentially a filesystem checker for both the VMFS metadata and the LVM (Logical Volume Manager). Now, if you have an outage either at the host or storage side, you have a mechanism to verify the integrity of your filesystems once everything comes back up. This gives you peace of mind when wondering if everything is ok after the outage. There is a requirement however to have the VMFS volume quiesced when running the VOMA utility. This post will look at some possible reasons for VOMA to report that it found hosts actively heartbeating on the datastore even when there are no running VMs.

1. No running VM on datastore

Let’s begin with a newly created datastore – there are no running VMs, nor are there any other vSphere features using this datastore. VOMA runs successfully in this case.

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in default mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected file system (labeled:’voma-test’) with UUID:50eae142-ad6a37a0-9a40-0025b5000016, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.

Total Errors Found:           0
~ #

2. Migrate a running VM to the datastore – run VOMA from same host on which VM is running

This next test tries to run VOMA when there is a running VM on the datastore.

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 1 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1′
1): MAC address 00:25:b5:00:00:17
~ #

VOMA does not run in this case since there is activity on the filesystem. The MAC address is from the management interface of the ESXi host which own the running VM.

3. Run VOMA from different host on which VM is running

Let’s try the same command, but this time running it from a different host than the host which owns the VM. Is VOMA clever enough to know that another host has a lock?

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 1 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1′
1): MAC address 00:25:b5:00:00:17
~ #

Yes it is. There must be no running VMs on the datastore, both locally and remotely.

4. Power off VM – run VOMA

Let’s now power off the VM. VOMA doesn’t care about powered off VMs, only running VMs.

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in default mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected file system (labeled:’voma-test’) with UUID:50eae142-ad6a37a0-9a40-0025b5000016, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.

Total Errors Found:           0
~ #

Even though there is a VM on the datastore, so long as it is powered down, VOMA will run. Let’s look at some other possible causes of locks next.

5. Turn on vSphere HA – datastore used by vSphere HA for heartbeating

vSphere HA will create a .vSphere-HA directory on any datastore which it uses for heartbeating.

~ # ls -latr /vmfs/volumes/voma-test/
-r——–    1 root     root     262733824 Jan  7 14:52 .sbc.sf
-r——–    1 root     root     268435456 Jan  7 14:52 .pbc.sf
-r——–    1 root     root     267026432 Jan  7 14:52 .fdc.sf
-r——–    1 root     root      16187392 Jan  7 14:52 .fbb.sf
-r——–    1 root     root       4194304 Jan  7 14:52 .vh.sf
-r——–    1 root     root       1179648 Jan  7 14:52 .pb2.sf
drwxr-xr-x    1 root     root          1540 Jan  7 14:59 thick-to-thin-demo
drwx——    1 root     root           420 Jan  7 15:01 .vSphere-HA
drwxr-xr-t    1 root     root          1400 Jan  7 15:01 .
drwxr-xr-x    1 root     root           512 Jan  7 15:02 ..

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 3 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1′
1): MAC address 00:25:b5:00:00:16
2): MAC address 00:25:b5:00:00:17
3): MAC address 00:25:b5:00:00:07
~ #

There are 3 nodes in the cluster – the MAC address of the management interface is reported for each.

6. Turn off vSphere HA, turn on Storage I/O Control on datastore

Another feature which may lock the datastore and prevent VOMA from running is Storage I/O Control. SIOC create a directory which comprises the NAA id of the LUN, and a file called .iormstats.sf.

~ # ls -latr /vmfs/volumes/voma-test/
-r——–    1 root     root     262733824 Jan  7 14:52 .sbc.sf
-r——–    1 root     root     268435456 Jan  7 14:52 .pbc.sf
-r——–    1 root     root     267026432 Jan  7 14:52 .fdc.sf
-r——–    1 root     root      16187392 Jan  7 14:52 .fbb.sf
-r——–    1 root     root       4194304 Jan  7 14:52 .vh.sf
-r——–    1 root     root       1179648 Jan  7 14:52 .pb2.sf
drwxr-xr-x    1 root     root          1540 Jan  7 14:59 thick-to-thin-demo
drwx——    1 root     root           280 Jan  7 15:03 .vSphere-HA
drwxr-xr-x    1 root     root           420 Jan  7 15:06 .naa.60060160916128004294fd349e6ce011
-rwxr-xr-x    1 root     root       1048576 Jan  7 15:06 .iormstats.sf
drwxr-xr-x    1 root     root           512 Jan  7 15:06 ..
drwxr-xr-t    1 root     root          1680 Jan  7 15:06 .

~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 3 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1′
1): MAC address 00:25:b5:00:00:16
2): MAC address 00:25:b5:00:00:17
3): MAC address 00:25:b5:00:00:07
~ #

Again, there are 3 hosts sharing the datastore with SIOC enabled which is why we see 3 active hosts.

So there can be a number of different features that could prevent VOMA from running. There is an article I wrote some time ago on the vSphere storage blog which explains what could be writing to a VMFS volume when there are no running VMs. This contains additional pointers to why hosts could be heartbeating on a device when you try to run VOMA.

One recommendation is to unmount the VMFS filesystem before running VOMA. This guarantees that the datastore is completely quiesced, and if there is still heartbeating on the device, the unmount operation should report what it is.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan

2 thoughts on “VOMA – Found X actively heartbeating hosts on device

  1. I’d like to add – if you can’t easily stop all I/O on a datastore you want to check – you can collect a VMFS metadata dump with ‘dd’ (e.g. 1200 MB, see: http://kb.vmware.com/kb/1020645 for details) and let VOMA run against the dump by specifying the file name instead of the device (e.g. voma … -d /path/to/dump.dd). You would see a number of stale locks in this case (which are expected and which you can ignore), but a corruption would be identified this way as well. Note that the dump has to be taken from the VMFS partition (not from the beginning of the device).