What happens when VMFS heap depletes completely?

I’ve blogged about the VMFS heap situation numerous times now already. However, a question that I frequently get asked is what actual happens when heap runs out? I thought I’d put together a short article explaining the symptoms one would see when there is no VMFS heap left on an ESXi host. Thanks once again to my good friend and colleague, Paudie O’Riordan, for sharing his support experiences with me on this matter – “together we win”, right Paud?

An actual heap depletion message in the vmkernel.log looks similar to this:

WARNING: Heap: 2525: Heap vmfs3 already at its maximum size. Cannot expand.
WARNING: Heap: 2900: Heap_Align(vmfs3, 524288/524288 bytes, 8 align) failed. caller: 0x418028a95e74

But what are the user visible symptoms?

  • In the first instance, when trying to do virtual machine operations such as power-off, power-on or vMotion on a host that has zero VMFS heap free, you will typically see the requested operation fail. For example,  a vMotion operation of a VM to a destination ESXi host can return different messages, something similar to the following:

The VM failed to resume on the destination during early power on. Reason: 0 (Cannot allocate memory).

Cannot open the disk ‘/vmfs/volumes/5106a125-91aedda0-17fb-0025b551a04f/IP-FS-507/IP-FS-507_1.vmdk’ or one of the snapshot disks it depends on”. 

Other errors observed include “A general system error occurred: The virtual machine could not start”.

  • When browsing the VMFS datastores, you won’t see any folders or data when the system is in the condition.
  • The Guest OS may also become unreachable and/or unmanageable. By that I mean that you may longer be able to access the Guest OS via the console session. An attempt to open or use a console session may show the following error: “Unable to connect to the MKS: Virtual Machine Config File Does Not Exist”
  • One of our customers also observed vCenter displaying Provisioned Storage, Not-Shared Storage and Used Storage as 0GB.

VMFS Heap Depletion

  • One additional consequence appears on VMFS-5 ATS-only volumes. ATS is a vSphere Storage API for Array Integration (VAAI) and is used for locking (more information on ATS can be found here). As a result of heap depletion, there may be loss of access to the volume in a worst case scenario. Errors similar to the following may be visible in the vmkernel logs: WARNING: HBX: 1889: Failed to initialize VMFS3 distributed locking on volume 4f757c26-20c9b6e4-dfe9-00151763b054: Out of memory. This doesn’t appear to be an issue with non-ATS VMFS-5 volumes, which remain accessible in the event of a heap depletion, but obviously VM operations are still impacted.

As you can see, heap depletion can have serious side-effects in your environment. Therefore, if you are deploying a large number of very large VMDKs on your ESXi hosts(s), consider upgrading to ESXi 5.0p5 (released in March 2013) or 5.1U1 (released in April 2013). Both the default and the maximum size of the heap has been increased to 640MB. This means that the full 64TB of a VMFS volume may be addressed without the risk of heap depletion.

And to conclude, we are doing a lot of work in this area in our next release, and I will share much more details on how this issue will be a thing of the past as soon as I can.

8 comments
  1. Hi Cormac,

    With the latest release its really a great relief, when compared to previous. I really faced lot of issues with my client. What i have observed, if the vmdk are thin then we can accommodate lot of big vms. But if we are using EZ thick disk, then they are more prone to this issue. Is this correct ? what is the test result and how it effect with thin and thick disk ? i really faced this situation. But in the vmware KB there is no clear info about this.

    also is there any command to check the VMFS heap status ?

    Thanks
    Gopi

    • That’s a good observation. The heap depletion is tied closely with the use of pointer blocks for the larger VMDKs. However, with thin disks that grow over time, you can also hit the issue. I’ll see if I can dig out a way to allow you to examine the heap usage.

  2. I believe I just ran into this issue yesterday! Not a fun one. Time to update those hosts to Update 1…

  3. I am glad this is finally being addresses. I identified this issue with VMFS3 approximately 5 years ago. It drove me to NFS and I can say that I have not been happier and haven’t gone back to SAN based storage.

  4. Dear Cormac,

    I ran into similar issue with couple of infrastructures.

    Can you please expalin in much detail, why the heap size get exhausted?
    Who uses this Heap size?
    I was looking for some deep dive in this vmfs heap size………!!!!!!!!

Comments are closed.