VOMA – Found X actively heartbeating hosts on device
One of the long-awaited features introduced with vSphere 5.1 was VOMA (vSphere On-disk Metadata Analyzer). This is essentially a filesystem checker for both the VMFS metadata and the LVM (Logical Volume Manager). Now, if you have an outage either at the host or storage side, you have a mechanism to verify the integrity of your filesystems once everything comes back up. This gives you peace of mind when wondering if everything is ok after the outage. There is a requirement however to have the VMFS volume quiesced when running the VOMA utility. This post will look at some possible reasons for VOMA to report that it found hosts actively heartbeating on the datastore even when there are no running VMs.
1. No running VM on datastore
Let’s begin with a newly created datastore – there are no running VMs, nor are there any other vSphere features using this datastore. VOMA runs successfully in this case.
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in default mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected file system (labeled:’voma-test’) with UUID:50eae142-ad6a37a0-9a40-0025b5000016, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
Total Errors Found: 0
~ #
2. Migrate a running VM to the datastore – run VOMA from same host on which VM is running
This next test tries to run VOMA when there is a running VM on the datastore.
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 1 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1’
1): MAC address 00:25:b5:00:00:17
~ #
VOMA does not run in this case since there is activity on the filesystem. The MAC address is from the management interface of the ESXi host which own the running VM.
3. Run VOMA from different host on which VM is running
Let’s try the same command, but this time running it from a different host than the host which owns the VM. Is VOMA clever enough to know that another host has a lock?
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 1 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1’
1): MAC address 00:25:b5:00:00:17
~ #
Yes it is. There must be no running VMs on the datastore, both locally and remotely.
4. Power off VM – run VOMA
Let’s now power off the VM. VOMA doesn’t care about powered off VMs, only running VMs.
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Running VMFS Checker version 0.9 in default mode
Initializing LVM metadata, Basic Checks will be done
Phase 1: Checking VMFS header and resource files
Detected file system (labeled:’voma-test’) with UUID:50eae142-ad6a37a0-9a40-0025b5000016, Version 5:58
Phase 2: Checking VMFS heartbeat region
Phase 3: Checking all file descriptors.
Phase 4: Checking pathname and connectivity.
Phase 5: Checking resource reference counts.
Total Errors Found: 0
~ #
Even though there is a VM on the datastore, so long as it is powered down, VOMA will run. Let’s look at some other possible causes of locks next.
5. Turn on vSphere HA – datastore used by vSphere HA for heartbeating
vSphere HA will create a .vSphere-HA directory on any datastore which it uses for heartbeating.
~ # ls -latr /vmfs/volumes/voma-test/
-r——– 1 root root 262733824 Jan 7 14:52 .sbc.sf
-r——– 1 root root 268435456 Jan 7 14:52 .pbc.sf
-r——– 1 root root 267026432 Jan 7 14:52 .fdc.sf
-r——– 1 root root 16187392 Jan 7 14:52 .fbb.sf
-r——– 1 root root 4194304 Jan 7 14:52 .vh.sf
-r——– 1 root root 1179648 Jan 7 14:52 .pb2.sf
drwxr-xr-x 1 root root 1540 Jan 7 14:59 thick-to-thin-demo
drwx—— 1 root root 420 Jan 7 15:01 .vSphere-HA
drwxr-xr-t 1 root root 1400 Jan 7 15:01 .
drwxr-xr-x 1 root root 512 Jan 7 15:02 ..
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 3 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1’
1): MAC address 00:25:b5:00:00:16
2): MAC address 00:25:b5:00:00:17
3): MAC address 00:25:b5:00:00:07
~ #
There are 3 nodes in the cluster – the MAC address of the management interface is reported for each.
6. Turn off vSphere HA, turn on Storage I/O Control on datastore
Another feature which may lock the datastore and prevent VOMA from running is Storage I/O Control. SIOC create a directory which comprises the NAA id of the LUN, and a file called .iormstats.sf.
~ # ls -latr /vmfs/volumes/voma-test/
-r——– 1 root root 262733824 Jan 7 14:52 .sbc.sf
-r——– 1 root root 268435456 Jan 7 14:52 .pbc.sf
-r——– 1 root root 267026432 Jan 7 14:52 .fdc.sf
-r——– 1 root root 16187392 Jan 7 14:52 .fbb.sf
-r——– 1 root root 4194304 Jan 7 14:52 .vh.sf
-r——– 1 root root 1179648 Jan 7 14:52 .pb2.sf
drwxr-xr-x 1 root root 1540 Jan 7 14:59 thick-to-thin-demo
drwx—— 1 root root 280 Jan 7 15:03 .vSphere-HA
drwxr-xr-x 1 root root 420 Jan 7 15:06 .naa.60060160916128004294fd349e6ce011
-rwxr-xr-x 1 root root 1048576 Jan 7 15:06 .iormstats.sf
drwxr-xr-x 1 root root 512 Jan 7 15:06 ..
drwxr-xr-t 1 root root 1680 Jan 7 15:06 .
~ # voma -m vmfs -d /vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1
Checking if device is actively used by other hosts
Found 3 actively heartbeating hosts on device ‘/vmfs/devices/disks/naa.60060160916128004294fd349e6ce011:1’
1): MAC address 00:25:b5:00:00:16
2): MAC address 00:25:b5:00:00:17
3): MAC address 00:25:b5:00:00:07
~ #
Again, there are 3 hosts sharing the datastore with SIOC enabled which is why we see 3 active hosts.
So there can be a number of different features that could prevent VOMA from running. There is an article I wrote some time ago on the vSphere storage blog which explains what could be writing to a VMFS volume when there are no running VMs. This contains additional pointers to why hosts could be heartbeating on a device when you try to run VOMA.
One recommendation is to unmount the VMFS filesystem before running VOMA. This guarantees that the datastore is completely quiesced, and if there is still heartbeating on the device, the unmount operation should report what it is.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan
I’d like to add – if you can’t easily stop all I/O on a datastore you want to check – you can collect a VMFS metadata dump with ‘dd’ (e.g. 1200 MB, see: http://kb.vmware.com/kb/1020645 for details) and let VOMA run against the dump by specifying the file name instead of the device (e.g. voma … -d /path/to/dump.dd). You would see a number of stale locks in this case (which are expected and which you can ignore), but a corruption would be identified this way as well. Note that the dump has to be taken from the VMFS partition (not from the beginning of the device).
Thanks for the added clarification Pascal. Very useful.