vSphere 6.0 Storage Features Part 4: VMFS, VOMA and VAAI
There was a time when VMFS was the only datastore that could be used with ESXi. That has changed considerably, with the introduction of NFS (v3 and v4.1), Virtual Volumes and of course Virtual SAN. However VMFS continues to be used by a great many VMware customers and of course we look to enhance it with each release of vSphere. This post will cover changes and enhancements to VMFS in vSphere 6.0.
1. VMFS-3 volumes are deprecated
Yes, no more VMFS-3. Its VMFS-5 all the way now. In vSphere 6.0, new VMFS-3 volume creation will NOT be allowed going forward. The option has also been removed from the UI as well as the vmkfstools command. Existing VMFS-3 volumes will still continue to be read/write from ESXi 6.0 hosts but the UI will log a message about it being a VMFS-3 volume is detected, stating that these are now deprecated. Anyone still using VMFS-3 should really considering moving to VMFS-5. These volumes can be upgraded online without service interruption via the UI.
2. PB Cache statistics are now accessible
Does anyone remember about the issues we had in the past with VMFS heap depletion? This limited the amount of open files (VMDKs) we could have on a VMFS volume. In vSphere 5.5, we made considerable progress in this area, taking the main culprit (pointer block cache) out of the VMFS heap and giving it its own space. I wrote about this here. One issue with this change was that it was impossible to figure out how much space was being consumed, etc, by the PB cache, and if there was a need to resize it due to the amount of open files. Well, in 6.0 we now have metrics telling us about PB cache usage. OK, so not much going on below with my host’s PB cache, but try it on your own 6.0 system and see what it reports. Now you can track and size according to the demands of your system, and make a decision as to whether it needs to be resized.
[root@cs-ie-h01:~] esxcli storage vmfs pbcache get Cache Capacity Miss Ratio: 0 % Cache Size: 0 MiB Cache Size Max: 132 MiB Cache Usage: 0 % Cache Working Set: 0 TiB Cache Working Set Max: 32 TiB Vmfs Heap Overhead: 0 KiB Vmfs Heap Size: 22 MiB Vmfs Heap Size Max: 256 MiB [root@cs-ie-h01:~]
3. COW Root Entry Eviction
COW Root Entry cache is similar in many respects to PB cache, but it is used for VMFS sparse files (snapshots and linked clones typically) and not pointer blocks. This enhancement is mainly to address issues where if the customers use lots of snapshots or link clones, they may run out of space in the COW (Copy-On-Write) heap. See KB Article 1009086 for more information. With the fix in vSphere 6.0, customers should not run into this issue and customers who use lots of snapshots and linked clones should not need to change COW heap size.
4. VOMA Enhancements
VOMA, short for vSphere On-disk Metadata Analyzer, is a VMFS file system checker, for want of a better description, and was introduced in vSphere 5.1. However, this tool could only detect on-disk issues with the file system, it could never correct/fix them. Well, in 6.0, a version of VOMA now includes the option to fix on-disk issues as well.
The fix mode of VOMA can fix heartbeat and corrupt lock errors and is available to customers so it can be used directly without GSS intervention. Steps on how to use this will appear shortly in a public facing KB.
There is also a partition table checker which handles recovering VMFS partitions. This can only be used if the disk does not have any other partition on it. This is also available for direct customer use. Here is an example of first displaying, then deleting and finally restoring a VMFS partition on a disk using VOMA to display the relevant info.
# partedUtil getptbl /vmfs/devices/disks/naa.xxx gpt 1305 255 63 20971520 1 2048 20964824 AA31E02A400F11DB9590000C2911D1B8 vmfs 0 # partedUtil delete /vmfs/devices/disks/naa.xxx 1 # partedUtil getptbl /vmfs/devices/disks/naa.xxx gpt 1305 255 63 20971520 # voma -m ptbl -d /vmfs/devices/disks/naa.xxx -S -f fix Running Partition table checker version 0.1 in fix mode Phase 1: Checking device for valid primary GPT Detected valid GPT signatures Number Start End Type No valid partition entry detected Phase 2: Checking device for a valid backup GPT Detected valid GPT signatures Number Start End Type No valid partition entry detected Phase 3: Checking device for valid MBR table Phase 4: Searching for valid file system headers Detected valid LVM headers at offset 2097152 Detected VMFS file system (labeled:'testDatastore') with UUID:w-x-y-z, Version 5:60 Newly formatted VMFS5 file system detected Disk should have a GPT partition table with VMFS partition, start sector : 2048, end sector : 20964824
You can then use this information to recreate the partition table using the start and end sectors above:
# partedUtil setptbl /vmfs/devices/disks/naa.xxx gpt "1 2048 20964824 \ AA31E02A400F11DB9590000C2911D1B8 0" # partedUtil getptbl /vmfs/devices/disks/naa.xxx gpt 1305 255 63 20971520 1 2048 20964824 AA31E02A400F11DB9590000C2911D1B8 vmfs 0
This functionality is also being ported back to vSphere 5.5. See VMware KB article 2103078 for further information.
5. ATS Enhancements
We have made a number of enhancements around ATS (Atomic Test & Set). ATS is a far superior locking mechanism introduced a number of releases back to avoid using SCSI reservations for VMFS locks. The first enhancement to ATS in vSphere 6.0 is related to LVM, Logical Volume Manager operations. Up to now, we were still using SCSI-2 reservations for LVM operations. Now we use ATS exclusively for LVM operations if the device supports ATS.
The second enhancement related to VMFS volumes that have been upgraded from VMFS-3 to VMFS-5. When a VMFS-3 volume is upgraded to VMFS-5, we use what is referred to as mixed mode locking, i.e. we use ATS but if that fails, we fall back to SCSI-2 reservations. Now customers can use “esxcli storage vmfs lockmode set [–ats|–scsi]” to change lock mode to ATS-only (supported only for single extent volumes) or back to ATS+SCSI.
How is this useful you might ask? The reason is that if an ATS operation fails, ATS-only will continue to retry using ATS. With the ATS+SCSI method, the first lock attempt uses ATS and then falls back to using the older SCSI-2 Reserve/Release mechanism. A SCSI-2 Reserve will prevent other ESXi hosts from reading or writing to the disk while this ESXi host has the disk reserved. When the disk is under reservation, no other host may even do an ATS operation.
So when would an ATS operation fail? Well, take some very busy VMFS datastores. An ATS operation may fail due to another ESXi host performing an update on the given sector since this ESXi host last read the sector. With ATS-only, this host simply retries using ATS. With ATS+SCSI, this host would revert back to locking the whole LUN/datastore.
The end result of this enhancement should be better performance on those datastores that are set to ATS-only. From data that we have gathered, we see that ~70% of VMFS-3/upgraded VMFS-5 volumes were on ATS capable hardware. If you have a storage array that supports ATS, and upgraded VMFS-3 to VMFS-5, consider setting the volume to ATS-only.
This feature, COW Root Entry Eviction, should be delivered as a patch for 5.5. I need to create 300 virtual machines from a master, using snapshots, in what is knows and full linked clone, and now I have doubts. In Hyper-V it does work flawlessly, I have tested it. Could you pass on this comment to the powers-that-be?
What happens if I’m not licensed for VAAI (no enterprise plus) and I turn ATS only on? (I drop access to LUNs?). Also does this issue not impact VAAI enabled Native 1MB block VMFS volumes? What about new VMFS systems that are on lower non VAAI licensing?
Um, ATS was still a VAAI primitive last time I looked 🙂
With the deprecation of VMFS-3 volumes, has the ability to convert VM’s to thin via VMFS5 to VMFS5 datastores migration (with the same blocksize) been fixed (because VMFS3 with a different blocksize would use a different data mover)? Or, do we still have to use punchzero (vmkfstools -K) ?
I’m guessing you are asking about shrinking the VMDK size. The answer is no – not to my knowledge. But with the introduction of VVols, this shrinking of a VMDK will be possible from within the Guest OS level.