vSphere 5.1 Storage Enhancements – Part 8: Storage I/O Control
Storage I/O Control (SIOC) was initially introduced in vSphere 4.1 to provide I/O prioritization of virtual machines running on a cluster of ESXi hosts that had access to shared storage. It extended the familiar constructs of shares and limits, which existed for CPU and memory, to address storage utilization through a dynamic allocation of I/O queue slots across a cluster of ESXi servers. The purpose of SIOC is to address the ‘noisy neighbour’ problem, i.e. a low priority virtual machine impacting other higher priority virtual machines due to the nature of the application and its I/O running in that low priority VM.
vSphere 5.0 extended Storage I/O Control (SIOC) to provide cluster-wide I/O shares and limits for NFS datastores. This means that no single virtual machine should be able to create a bottleneck in any environment regardless of the type of shared storage used. SIOC automatically throttles a virtual machine which is consuming a disparate amount of I/O bandwidth when the configured latency threshold has been exceeded. In the above example, the data mining virtual machine (which happens to reside on a different host) is the ‘noisy neighbour’. To allow other virtual machines receive their fair share of I/O bandwidth on the same datastore, a share based fairness mechanism has been created which now is supported on both NFS and VMFS.
The following are the new enhancements to Storage I/O Control in vSphere 5.1.
1. Stats Only Mode
[Updated May 2014] SIOC has a new feature called Stats Only Mode. SIOC Stats Only Mode was originally planned to have been turned on automatically, but it appears that is was disabled at the last-minute in the vSphere 5.1 release. When enabled, it doesn’t enforce throttling but gathers statistics to assist Storage DRS. Storage DRS now has statistics in advance for new datastores being added to the datastore cluster & can get up to speed on the datastores profile/capabilities much quicker than before.
2. Automatic Threshold Computation
The default latency threshold for SIOC is 30 msecs. Not all storage devices are created equal so this default is set to a middle-of-the-ground range. There are certain devices which will hit their natural contention point earlier than others, e.g. SSDs, in which case the threshold should be lowered by the user. However, manually determining the correct latency can be difficult for users. This motivates the need for the latency threshold to get automatically determined at a correct level for each device. Another enhancement to SIOC is that SIOC is now turned on in stats only mode. This means that interesting statistics which are only presented when SIOC is enabled will now be available immediately.
To figure out the best threshold, the new automatic threshold detection uses the I/O injector modelling functionality of SIOC to determine what the peak throughput of a datastore is.
When peak throughput is measured, latency is also measured.
The latency threshold value at which Storage I/O Control will kick in is then set to 90% of this peak value (by default).
vSphere administrators can change this 90% to another percentage value or they can still input a millisecond value if they so wish.
3. VMobservedLatency
VmObservedLatency is a new metric. It replaces the datastore latency metric which was used in previous versions of SIOC. This new metric measures the time between VMkernel receiving the I/O from the VM, and the response coming back from the datastore. Previously we only measured the latency once the I/O had left the ESXi host, so now we are also measuring latency in the VMkernel as well. This new metric will be visible in the vSphere UI Performance Charts. For further reading, my good friend and colleague Frank Denneman wrote an excellent article on how SIOC calculates latency across all hosts when checking if the threshold has been exceeded.
I am a big fan of Storage I/O Control. I wrote a myth-busting article about it on the vSphere Storage blog some time back. I’d urge you all to try it out if you are in a position to do so.
Does SIOC also apply when for example I do a full VM Restore from VEEAM?
I believe it will. I know that Storage vMotion operations are ‘billed’ to the VM, so I suspect that this restore operation will be ‘billed’ to the VM in the same way (with the caveat of not actually testing it myself).
Hi, how does storage IO work with LUN’s that are provided from the same set of spindles.
Can a noisy machine on LUN x be throttled back so a high priority machine on LUN y can have more IOPS available to it?
thanks!
Hi Daniel,
Each LUN has its own I/O queue – and this queue is throttled if the latency threshold is reached.
So although LUNs may be presented from the same set of spindles, this queue throttling on a per device basis should still help in ‘noisy neighbour’ situations.
Hi Cormac:
Thanks for the post
Please, correct me if I’m wrong
If you enable the SIOC then the automatic threshold computation apply, unless you change the value manually using the advance button?
This is the reason because a warning appear when you try to change it?
Thanks
Yes. In 5.1, the threshold is automatically calculated using the I/O injector. But you can still change it to a different % value if you wish 9by default it is 90%) or you can even set a millisecond value like we used to have in previous versions.
Cormac-
What about the interaction between auto-tiering solutions like EMC VNX and SIOC?
The SIOC throttles the datastore but the datastore is carved from a LUN supported by a storage pool that has auto-tiering and FAST cache.
How does this interplay?
Thanks!
Bottom line – SIOC will only kick in when it detects congestion, and only then will it throttle the queue depths. So it doesn’t matter what is on the back-end (even tiered storage) – if congestion occurs, SIOC will do its thing. When things sort themselves out on the back-end, and there is no further congestion, SIOC will bring the queue depth back up to normal.