vSphere 6.0 Storage Features Part 4: VMFS, VOMA and VAAI

VMFSThere was a time when VMFS was the only datastore that could be used with ESXi. That has changed considerably, with the introduction of NFS (v3 and v4.1), Virtual Volumes and of course Virtual SAN. However VMFS continues to be used by a great many VMware customers and of course we look to enhance it with each release of vSphere. This post will cover changes and enhancements to VMFS in vSphere 6.0.

1. VMFS-3 volumes are deprecated

Yes, no more VMFS-3. Its VMFS-5 all the way now. In vSphere 6.0, new VMFS-3 volume creation will NOT be allowed going forward. The option has also been removed from the UI as well as the vmkfstools command. Existing VMFS-3 volumes will still continue to be read/write from ESXi 6.0 hosts but the UI will log a message about it being a VMFS-3 volume is detected, stating that these are now deprecated. Anyone still using VMFS-3 should really considering moving to VMFS-5. These volumes can be upgraded online without service interruption via the UI.

2. PB Cache statistics are now accessible

Does anyone remember about the issues we had in the past with VMFS heap depletion? This limited the amount of open files (VMDKs) we could have on a VMFS volume. In vSphere 5.5, we made considerable progress in this area, taking the main culprit (pointer block cache) out of the VMFS heap and giving it its own space. I wrote about this here. One issue with this change was that it was impossible to figure out how much space was being consumed, etc, by the PB cache, and if there was a need to resize it due to the amount of open files. Well, in 6.0 we now have metrics telling us about PB cache usage. OK, so not much going on below with my host’s PB cache, but try it on your own 6.0 system and see what it reports. Now you can track and size according to the demands of your system, and make a decision as to whether it needs to be resized.

[root@cs-ie-h01:~] esxcli storage vmfs pbcache get
   Cache Capacity Miss Ratio: 0 %
   Cache Size: 0 MiB
   Cache Size Max: 132 MiB
   Cache Usage: 0 %
   Cache Working Set: 0 TiB
   Cache Working Set Max: 32 TiB
   Vmfs Heap Overhead: 0 KiB
   Vmfs Heap Size: 22 MiB
   Vmfs Heap Size Max: 256 MiB
[root@cs-ie-h01:~]

3. COW Root Entry Eviction

COW Root Entry cache is similar in many respects to PB cache, but it is used for VMFS sparse files (snapshots and linked clones typically) and not pointer blocks. This enhancement is mainly to address issues where if the customers use lots of snapshots or link clones, they may run out of space in the COW (Copy-On-Write) heap. See KB Article 1009086 for more information. With the fix in vSphere 6.0,  customers should not run into this issue and customers who use lots of snapshots and linked clones should not need to change COW heap size.

4. VOMA Enhancements

VOMA, short for  vSphere On-disk Metadata Analyzer, is a VMFS file system checker, for want of a better description, and was introduced in vSphere 5.1. However, this tool could only detect on-disk issues with the file system, it could never correct/fix them. Well, in 6.0, a version of VOMA now includes the option to fix on-disk issues as well.

The fix mode of VOMA can fix heartbeat and corrupt lock errors and is available to customers so it can be used directly without GSS intervention. Steps on how to use this will appear shortly in a public facing KB.

There is also a partition table checker which handles recovering VMFS partitions. This can only be used if the disk does not have any other partition on it. This is also available for direct customer use. Here is an example of first displaying, then deleting and finally restoring a VMFS partition on a disk using VOMA to display the relevant info.

# partedUtil getptbl /vmfs/devices/disks/naa.xxx
gpt
1305 255 63 20971520
1 2048 20964824 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

# partedUtil delete /vmfs/devices/disks/naa.xxx 1

# partedUtil getptbl /vmfs/devices/disks/naa.xxx
gpt
1305 255 63 20971520

# voma -m ptbl -d /vmfs/devices/disks/naa.xxx -S -f fix
Running Partition table checker version 0.1 in fix mode
Phase 1: Checking device for valid primary GPT
 Detected valid GPT signatures
 Number Start End Type
No valid partition entry detected
Phase 2: Checking device for a valid backup GPT
 Detected valid GPT signatures
 Number Start End Type
No valid partition entry detected
Phase 3: Checking device for valid MBR table
Phase 4: Searching for valid file system headers
 Detected valid LVM headers at offset 2097152
 Detected VMFS file system (labeled:'testDatastore') with 
 UUID:w-x-y-z, Version 5:60
Newly formatted VMFS5 file system detected
Disk should have a GPT partition table with VMFS partition, 
start sector : 2048, end sector : 20964824

You can then use this information to recreate the partition table using the start and end sectors above:

# partedUtil setptbl /vmfs/devices/disks/naa.xxx gpt "1 2048 20964824 \
AA31E02A400F11DB9590000C2911D1B8 0"

# partedUtil getptbl /vmfs/devices/disks/naa.xxx
gpt
1305 255 63 20971520
1 2048 20964824 AA31E02A400F11DB9590000C2911D1B8 vmfs 0

This functionality is also being ported back to vSphere 5.5. See VMware KB article 2103078 for further information.

5.  ATS Enhancements

We have made a number of enhancements around ATS (Atomic Test & Set). ATS is a far superior locking mechanism introduced a number of releases back to avoid using SCSI reservations for VMFS locks. The first enhancement to ATS in vSphere 6.0 is related to LVM, Logical Volume Manager operations. Up to now, we were still using SCSI-2 reservations for LVM operations. Now we use ATS exclusively for LVM operations if the device supports ATS.

The second enhancement related to VMFS volumes that have been upgraded from VMFS-3 to VMFS-5. When a VMFS-3 volume is upgraded to VMFS-5, we use what is referred to as mixed mode locking, i.e. we use ATS but if that fails, we fall back to SCSI-2 reservations. Now customers can use “esxcli storage vmfs lockmode set [–ats|–scsi]” to change lock mode to ATS-only (supported only for single extent volumes) or back to ATS+SCSI.

How is this useful you might ask? The reason is that if an ATS operation fails, ATS-only will continue to retry using ATS. With the ATS+SCSI method, the first lock attempt uses ATS and then falls back to using the older SCSI-2 Reserve/Release mechanism. A SCSI-2 Reserve will prevent other ESXi hosts from reading or writing to the disk while this ESXi host has the disk reserved. When the disk is under reservation, no other host may even do an ATS operation.

So when would an ATS operation fail? Well, take some very busy VMFS datastores. An ATS operation may fail due to another ESXi host performing an update on the given sector since this ESXi host last read the sector. With ATS-only, this host simply retries using ATS. With ATS+SCSI, this host would revert back to locking the whole LUN/datastore.

The end result of this enhancement should be better performance on those datastores that are set to ATS-only. From data that we have gathered, we see that ~70% of VMFS-3/upgraded VMFS-5 volumes were on ATS capable hardware. If you have a storage array that supports ATS, and upgraded VMFS-3 to VMFS-5, consider setting the volume to ATS-only.

7 comments
  1. This feature, COW Root Entry Eviction, should be delivered as a patch for 5.5. I need to create 300 virtual machines from a master, using snapshots, in what is knows and full linked clone, and now I have doubts. In Hyper-V it does work flawlessly, I have tested it. Could you pass on this comment to the powers-that-be?

  2. What happens if I’m not licensed for VAAI (no enterprise plus) and I turn ATS only on? (I drop access to LUNs?). Also does this issue not impact VAAI enabled Native 1MB block VMFS volumes? What about new VMFS systems that are on lower non VAAI licensing?

  3. With the deprecation of VMFS-3 volumes, has the ability to convert VM’s to thin via VMFS5 to VMFS5 datastores migration (with the same blocksize) been fixed (because VMFS3 with a different blocksize would use a different data mover)? Or, do we still have to use punchzero (vmkfstools -K) ?

    • I’m guessing you are asking about shrinking the VMDK size. The answer is no – not to my knowledge. But with the introduction of VVols, this shrinking of a VMDK will be possible from within the Guest OS level.

Comments are closed.