I’ve blogged about the VMFS heap situation numerous times now already. However, a question that I frequently get asked is what actual happens when heap runs out? I thought I’d put together a short article explaining the symptoms one would see when there is no VMFS heap left on an ESXi host. Thanks once again to my good friend and colleague, Paudie O’Riordan, for sharing his support experiences with me on this matter – “together we win”, right Paud?
An actual heap depletion message in the vmkernel.log looks similar to this:
WARNING: Heap: 2525: Heap vmfs3 already at its maximum size. Cannot expand.
WARNING: Heap: 2900: Heap_Align(vmfs3, 524288/524288 bytes, 8 align) failed. caller: 0x418028a95e74
But what are the user visible symptoms?
- In the first instance, when trying to do virtual machine operations such as power-off, power-on or vMotion on a host that has zero VMFS heap free, you will typically see the requested operation fail. For example, a vMotion operation of a VM to a destination ESXi host can return different messages, something similar to the following:
“The VM failed to resume on the destination during early power on. Reason: 0 (Cannot allocate memory).
Cannot open the disk ‘/vmfs/volumes/5106a125-91aedda0-17fb-0025b551a04f/IP-FS-507/IP-FS-507_1.vmdk’ or one of the snapshot disks it depends on”.
Other errors observed include “A general system error occurred: The virtual machine could not start”.
- When browsing the VMFS datastores, you won’t see any folders or data when the system is in the condition.
- The Guest OS may also become unreachable and/or unmanageable. By that I mean that you may longer be able to access the Guest OS via the console session. An attempt to open or use a console session may show the following error: “Unable to connect to the MKS: Virtual Machine Config File Does Not Exist”
- One of our customers also observed vCenter displaying Provisioned Storage, Not-Shared Storage and Used Storage as 0GB.
- One additional consequence appears on VMFS-5 ATS-only volumes. ATS is a vSphere Storage API for Array Integration (VAAI) and is used for locking (more information on ATS can be found here). As a result of heap depletion, there may be loss of access to the volume in a worst case scenario. Errors similar to the following may be visible in the vmkernel logs: WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 4f757c26-20c9b6e4-dfe9-00151763b054: Out of memory. This doesn’t appear to be an issue with non-ATS VMFS-5 volumes, which remain accessible in the event of a heap depletion, but obviously VM operations are still impacted.
As you can see, heap depletion can have serious side-effects in your environment. Therefore, if you are deploying a large number of very large VMDKs on your ESXi hosts(s), consider upgrading to ESXi 5.0p5 (released in March 2013) or 5.1U1 (released in April 2013). Both the default and the maximum size of the heap has been increased to 640MB. This means that the full 64TB of a VMFS volume may be addressed without the risk of heap depletion.
And to conclude, we are doing a lot of work in this area in our next release, and I will share much more details on how this issue will be a thing of the past as soon as I can.