Storage I/O Control (SIOC) was initially introduced in vSphere 4.1 to provide I/O prioritization of virtual machines running on a cluster of ESXi hosts that had access to shared storage. It extended the familiar constructs of shares and limits, which existed for CPU and memory, to address storage utilization through a dynamic allocation of I/O queue slots across a cluster of ESXi servers. The purpose of SIOC is to address the ‘noisy neighbour’ problem, i.e. a low priority virtual machine impacting other higher priority virtual machines due to the nature of the application and its I/O running in that low priority VM.
vSphere 5.0 extended Storage I/O Control (SIOC) to provide cluster-wide I/O shares and limits for NFS datastores. This means that no single virtual machine should be able to create a bottleneck in any environment regardless of the type of shared storage used. SIOC automatically throttles a virtual machine which is consuming a disparate amount of I/O bandwidth when the configured latency threshold has been exceeded. In the above example, the data mining virtual machine (which happens to reside on a different host) is the ‘noisy neighbour’. To allow other virtual machines receive their fair share of I/O bandwidth on the same datastore, a share based fairness mechanism has been created which now is supported on both NFS and VMFS.
The following are the new enhancements to Storage I/O Control in vSphere 5.1.
1. Stats Only Mode
SIOC is now turned on in stats only mode automatically. It doesn’t enforce throttling but gathers statistics to assist Storage DRS. Storage DRS now has statistics in advance for new datastores being added to the datastore cluster & can get up to speed on the datastores profile/capabilities much quicker than before.
2. Automatic Threshold Computation
The default latency threshold for SIOC is 30 msecs. Not all storage devices are created equal so this default is set to a middle-of-the-ground range. There are certain devices which will hit their natural contention point earlier than others, e.g. SSDs, in which case the threshold should be lowered by the user. However, manually determining the correct latency can be difficult for users. This motivates the need for the latency threshold to get automatically determined at a correct level for each device. Another enhancement to SIOC is that SIOC is now turned on in stats only mode. This means that interesting statistics which are only presented when SIOC is enabled will now be available immediately.
To figure out the best threshold, the new automatic threshold detection uses the I/O injector modelling functionality of SIOC to determine what the peak throughput of a datastore is.
When peak throughput is measured, latency is also measured.
The latency threshold value at which Storage I/O Control will kick in is then set to 90% of this peak value (by default).
vSphere administrators can change this 90% to another percentage value or they can still input a millisecond value if they so wish.
VmObservedLatency is a new metric. It replaces the datastore latency metric which was used in previous versions of SIOC. This new metric measures the time between VMkernel receiving the I/O from the VM, and the response coming back from the datastore. Previously we only measured the latency once the I/O had left the ESXi host, so now we are also measuring latency in the VMkernel as well. This new metric will be visible in the vSphere UI Performance Charts. For further reading, my good friend and colleague Frank Denneman wrote an excellent article on how SIOC calculates latency across all hosts when checking if the threshold has been exceeded.
I am a big fan of Storage I/O Control. I wrote a myth-busting article about it on the vSphere Storage blog some time back. I’d urge you all to try it out if you are in a position to do so.
Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @VMwareStorage