Adaptive Queueing vs. Storage I/O Control
This post is to look at two different technologies available in vSphere to manage the queue depth on your ESXi host(s). A queue determines how many outstanding I/Os can be sent to a disk. In the case of vSphere environments, where many hosts can be doing I/O to the same shared disk device, it can be helpful to throttle the LUN queue depth from time to time when congestion arises. In this post, we will compare and contrast Adaptive Queues with Storage I/O Control (SIOC).
How do the technologies differ?
Storage I/O Control uses the concept of a congestion threshold, which is based on latency. Essentially, if the latency of a particular datastore rises above a particular threshold, SIOC will kick-in. This threshold value can be set by the administrator (i.e. a millisecond value – 30ms by default) or it can be automatically determined by the I/O injector portion of SIOC. The automatic detection mechanism is only available in vSphere 5.1 however. With adaptive queueing, we are waiting on a queue full condition to occur, i.e. the storage array indicates that there is I/O congestion on the I/O path by returning a SCSI Sense Code of BUSY or QUEUE FULL status.
What sort of throttling takes place?
Both technologies work by throttling the queue depth. With SIOC, each virtual machine sharing the datastore is assigned a shares value. The VMs may reside on different hosts. When the congestion threshold is reached, access to the shared disk is managed by throttling the queue depth on a per host basis, but taking into account the shares values of the VMs on each host. Let’s take the example of two hosts sharing a VMFS, and each host having two VMs. If the VMs on the first host have a shares value of 1000 each, and the VMs on the second host have a shares value of 2000 each, then the second host will be allowed to queue twice as many I/Os as the first host when congestion occurs. Once congestion comes back down below the threshold value, all hosts/VMs can have their full queue depth and SIOC will stop throttling.
Adaptive queueing is not quite so granular. With adaptive queueing, on receipt of the queue full status, the LUN queue depth of the host to that device is halved. For this reason, it is important that all hosts sharing a datastore enable adaptive queueing. You can adversely affect the performance if you enable adaptive queueing on some hosts sharing a datastore, but not on other hosts sharing the same datastore. Once the queue full state is cleared, the queue depth begins to increment once again, one slot at a time.
When were the technologies introduced?
SIOC was introduced in vSphere 4.1 for block devices and vSphere 5.0 for NAS devices. Adaptive queueing first appeared in ESX 3.5U4.
How do you enable the technologies?
For SIOC, navigate to a datastore, click on the Properties link and you will see a section for Storage I/O Control. It is disabled by default. Note that SIOC is an Enterprise+ feature so if you do not have that edition, you will not be able to use it.
For adaptive queueing, there are some advanced parameters which must be set on a per host basis. From the Configuration tab, select Software Advanced Settings. Navigate to Disk and the two parameters you need are Disk.QFullSampleSize and Disk.QFullThreshold. By default, QFullSampleSize is set to 0, meaning that it is disabled. When this value is set, adaptive queueing will kick in and half the queue depth when this number of queue full conditions is reported by the array. The QFullThreshold is the number of good status to receive before incrementing the queue once again.
Both features can serve a purpose in your environment. As you can see, SIOC is a more feature rich technology, allowing prioritizing of VMs and a more granular approach to queue depth management. Adaptive queueing is not quite so subtle, and simply halves the queue depth when congestion arises. However, Adaptive Queueing is available on all editions as far as I know, whereas SIOC is Enterprise+ only. You do have to take precautions with adaptive queueing however, ensuring all hosts sharing a datastore have the feature enabled. Caution is also needed if sharing array ports with other operating systems. KB article 1008113 has further details.
Do you have any guidance around using Adaptive Queueing and SIOC together — is it recommended or a Bad Idea?
Without thinking about it too much, I would think that using SIOC would be preferred if you’re licensed for that feature, but using AQ is better than nothing as long as you configure it appropriately.
Hey Doug,
To be honest, I haven’t ever turned both features on together. I suspect that they may step on each others toes so to speak, so I would go with either one method or the other. I agree with your conclusion – use SIOC if you have the correct vSphere edition, and use AQ if you do not.
Cormac: I worked on kernel side of integrating Storage IO Control and adoptive queuing from SCSI BUSY, etc.
The two features work together just fine. Basically, Storage IO Control obeys the queue depth set by adaptive queuing.
So, fear not 🙂
Ah – great. Thanks for the clarification Irfan.
The other reason to use AQ is if you have a wide striped array and have any workload other than VMWare on the array or if you’re using an array based technology that doesn’t play well with SIOC. SIOC will simply give up if it sees any measurable load outside its control.
That’s a good point Andrew – thanks!
Excellent guide. Many thanks. Hope you did not might a link to your post from my blog?
Point to note that a QUEUE FULL from an array will effect all hosts connected via that IO Path – non VMWare hosts too if they share the same Front End ports to the SAN as you do point out in your conclusion.
Queue full conditions is not necessarily a problem but large numbers of these will effect performance of not just VMWare hosts and their VM’s.
Just me 2 pence worth with my SAN Administrator Hat on 🙂
Phil