This is a query which has come up on numerous occasions in the past, especially in the comments section of a blog post on debunking SIOC myths on the vSphere Storage Blog. This post is to highlight some recommendations which should be implemented when you have a storage array which presents LUNs which are spread across all spindles, or indeed multiple LUNs all being backed by the same set of spindles from a particular aggregate or storage pool.
VMware engineering is continuing to look at options available in such configurations, especially where a VM on LUN1 experiences high latency while VMs on other LUNs do not (for example, if the VM on LUN1 is doing a ‘worse case’ random workload while VMs on other LUNs are running sequential workloads). Basically, in these situations, there is a noisy neighbour VM which is not on the same datastore which can impact VMs on other datastores. How should/could SIOC handle this scenario?
Currently, the recommendations regarding SIOC deployment when datastores are spread across all spindles in the storage array are as follows:
- Enable SIOC in all the LUNs coming from the shared spindles, and
- Set the congestion threshold to be the same across all LUNs
This configuration will result in congestion detection on ALL the LUNs and hence will result in I/O being throttled back on all datastores.
There are two caveats:
- There is no proportional share enforcement across LUNs. Hence, if the sum of all shares on LUN1 is 100 and LUN2 is 1000, you will NOT get 1:10 throttle ration across LUNs. However, the I/O throttling at each LUN will respect the shares.
- There might be corner cases where the congestion threshold is set slightly differently if the automatic threshold setting mechanism is used (introduced in vSphere 5.1). There is a good chance that the model parameters may be slightly different when measured from different LUNs. As a result, the LUN with the lower threshold setting might see its performance not isolated due to load to some other LUN. Therefore the recommendation is to set the threshold to be the same as per the advice above.
I would also mention that if you have numerous vCenter servers, numerous clusters and numerous ESXi hosts all sharing access to the same array (albeit different LUNs) and you wish to use the SIOC feature, look at a storage management scheme whereby different pools/aggregates are created on a per vCenter basis.
Another important point is that neither datastores using an unshared set of spindles or datastores sharing the same set of spindles should be presented to ESXi hosts managed by different vCenter servers. As per KB 1020651, an unsupported configuration is “Not all of the hosts accessing the datastore are managed by the same vCenter Server” which may result in “An unmanaged I/O workload is detected on a SIOC datastore”. If the same storage aggregate/pool on the array is shared by multiple vCenter servers (even though they all access different LUNs), you may run into the noisy neighbour VM impacting VMs on different datastores. With separate pools/aggregates, and ensuring that all ESXi hosts accessing the datastores are managed by the same vCenter server, you can avoid this. To finish, if an unmanaged I/O workload is detected by SIOC, any I/O throttling will stop.
We understand that there is considerable storage planning around this, and as mentioned, we continue to look into this to see if there are way to improve upon SIOC so that it can determine noisy neighbour conditions occurring when datastores are backed by a shared set of spindles. More info as I get it.