This post is to look at two different technologies available in vSphere to manage the queue depth on your ESXi host(s). A queue determines how many outstanding I/Os can be sent to a disk. In the case of vSphere environments, where many hosts can be doing I/O to the same shared disk device, it can be helpful to throttle the LUN queue depth from time to time when congestion arises. In this post, we will compare and contrast Adaptive Queues with Storage I/O Control (SIOC).
How do the technologies differ?
Storage I/O Control uses the concept of a congestion threshold, which is based on latency. Essentially, if the latency of a particular datastore rises above a particular threshold, SIOC will kick-in. This threshold value can be set by the administrator (i.e. a millisecond value – 30ms by default) or it can be automatically determined by the I/O injector portion of SIOC. The automatic detection mechanism is only available in vSphere 5.1 however. With adaptive queueing, we are waiting on a queue full condition to occur, i.e. the storage array indicates that there is I/O congestion on the I/O path by returning a SCSI Sense Code of BUSY or QUEUE FULL status.
What sort of throttling takes place?
Both technologies work by throttling the queue depth. With SIOC, each virtual machine sharing the datastore is assigned a shares value. The VMs may reside on different hosts. When the congestion threshold is reached, access to the shared disk is managed by throttling the queue depth on a per host basis, but taking into account the shares values of the VMs on each host. Let’s take the example of two hosts sharing a VMFS, and each host having two VMs. If the VMs on the first host have a shares value of 1000 each, and the VMs on the second host have a shares value of 2000 each, then the second host will be allowed to queue twice as many I/Os as the first host when congestion occurs. Once congestion comes back down below the threshold value, all hosts/VMs can have their full queue depth and SIOC will stop throttling.
Adaptive queueing is not quite so granular. With adaptive queueing, on receipt of the queue full status, the LUN queue depth of the host to that device is halved. For this reason, it is important that all hosts sharing a datastore enable adaptive queueing. You can adversely affect the performance if you enable adaptive queueing on some hosts sharing a datastore, but not on other hosts sharing the same datastore. Once the queue full state is cleared, the queue depth begins to increment once again, one slot at a time.
When were the technologies introduced?
SIOC was introduced in vSphere 4.1 for block devices and vSphere 5.0 for NAS devices. Adaptive queueing first appeared in ESX 3.5U4.
How do you enable the technologies?
For SIOC, navigate to a datastore, click on the Properties link and you will see a section for Storage I/O Control. It is disabled by default. Note that SIOC is an Enterprise+ feature so if you do not have that edition, you will not be able to use it.
For adaptive queueing, there are some advanced parameters which must be set on a per host basis. From the Configuration tab, select Software Advanced Settings. Navigate to Disk and the two parameters you need are Disk.QFullSampleSize and Disk.QFullThreshold. By default, QFullSampleSize is set to 0, meaning that it is disabled. When this value is set, adaptive queueing will kick in and half the queue depth when this number of queue full conditions is reported by the array. The QFullThreshold is the number of good status to receive before incrementing the queue once again.
Both features can serve a purpose in your environment. As you can see, SIOC is a more feature rich technology, allowing prioritizing of VMs and a more granular approach to queue depth management. Adaptive queueing is not quite so subtle, and simply halves the queue depth when congestion arises. However, Adaptive Queueing is available on all editions as far as I know, whereas SIOC is Enterprise+ only. You do have to take precautions with adaptive queueing however, ensuring all hosts sharing a datastore have the feature enabled. Caution is also needed if sharing array ports with other operating systems. KB article 1008113 has further details.