Thanks to our friends over at EMC (shout out to Itzik), we’ve recently been made aware of a limitation on our UNMAP mechanism in ESXi 5.0 & 5.1. It would appear that if you attempt to reclaim more than 2TB of dead space in a single operation, the UNMAP primitive is not handling this very well. The current thought is that this is because we have a 2TB (- 512 byte) file size limit on VMFS-5. When the space to reclaim is above this size, we cannot create the very large temporary balloon file (part of the UNMAP process), and it spews the following errors:
Tag Archives: heads-up
Heads Up! Device Queue Depth on QLogic HBAs
Just thought I’d bring to your attention something that has been doing the rounds here at VMware recently, and will be applicable to those of you using QLogic HBAs with ESXi 5.x. The following are the device queue depths you will find when using QLogic HBAs for SAN connectivity:
- ESXi 4.1 U2 – 32
- ESXi 5.0 GA – 64
- ESXi 5.0 U1 – 64
- ESXi 5.1 GA – 64
The higher depth of 64 has been this way since 24 Aug 2011 (the 5.0 GA release). The issue is that this has not been documented anywhere. For the majority of users, this is not an area of concern and is probably a benefit. But there are some concerns.
Heads Up! New Patches for VMFS heap
Many of you in the storage field will be aware of a limitation with the maximum amount of open files on a VMFS volume. It has been discussed extensively, with a blog articles on the vSphere blog by myself, but also articles by such luminaries as Jason Boche and Michael Webster.
In a nutshell, ESXi has a limited amount of VMFS heap space by default. While you can increase it from the default to the maximum, there are still some gaps. When you create very many VMDKs on a very large VMFS volume, the double indirect pointer mechanism to address the blocks way out in the address space consume heap. The result is that although we supported very large VMFS volumes (up to 64TB), the reality up to now is that a single host (since heap is defined on a per host basis) could only address in the region of 30TB of open files. This isn’t always an issue, since typically VMFS is a clustered file system and is shared by many hosts. Therefore one would typically have the open VMDKs spread across many hosts in a cluster. However it is an issue for stand-alone hosts with lots of virtual machines with lots of VMDKs, and is also an issue for hosts which want to have a virtual machine with a lot of VMDKs attached, for the purposes of a file share for example.
Anyway, to cut to the chase, a recent patch release for ESXi 5.0 increases the default heap size to 256MB and maximum heap size to 640MB per ESXi host. This should allow a single ESXi host to access in the region of 60TB open VMDK. Previously the default was 80MB and the maximum was 256MB, so we have increased this significantly. This is pretty much the maximum size of the VMFS volume anyway. The patch is Patch ESXi500-201303401-BG.
Although the patch for ESXi 5.1 is not yet out, it should be available very shortly, and will have a similar fix.
For those of you using very large VMFS volume with lots of virtual machines disk files, consider scheduling a maintenance slot very soon to apply these patches. This is not an issue for NFS, fyi.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage
Microsoft Clustering on vSphere – Incompatible Device Errors
When setting up a Microsoft Cluster with nodes running in vSphere Virtual Machines across ESXi hosts, I have come across folks who have experienced Incompatible device backing specified for device ‘0’ errors. These are typically a result of the RDM (Raw Device Mapping) setup not being quite right. There can be a couple of reasons for this, as highlighted here.
Different SCSI Controller
On one occasion, the RDM was mapped to the same SCSI controller as the Guest OS boot disk. Once the RDM was moved to its own unique SCSI controller, it resolved the issue. Basically, if the OS disk is configured to use SCSI 0:0, then you cannot put the RDM on SCSI 0:1, or SCSI 0:2. You must put the RDM on SCSI 1:x or SCSI 2:x.
Matching LUN ID
Another reason for the above error is when the RDM is presented to the different ESXi hosts using a different LUN ID. The RDM must be presented to all ESXi hosts (and thus all MSCS nodes) using the same LUN ID.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage
Heads Up! NetApp NFS Disconnects
I just received notification about KB article 2016122 which VMware has just published. It deals with a topic that I’ve seen discussed recently on the community forums. The symptom is that during periods of high I/O, NFS datastores from NetApp arrays become unavailable for a short period of time, before becoming available once again. This seems to be primarily observed when the NFS datastores are presented to ESXi 5.x hosts.
The KB article described a work-around for the issue which is to tune the queue depth size on the ESXi hosts which will reduce I/O congestion to the datastore. By default, the value of NFS.MaxQueueDepth is 4294967295 (which basically means unlimited). The workaround is to change this value to 64. This has been shown to prevent the disconnects. A permanent solution is still being investigated.
I recommend all NetApp customers read this KB article, whether you have been impacted or not.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage