Many of you will be aware that Storage DRS uses Storage I/O Control (SIOC) for load balancing based on I/O metrics. However a statement in one of our white papers has raised a few questions recently with both our customers and partners. The statement is as follows:
“Queue depth throttling is not compatible with Storage DRS”. (pg.34) from http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf.
This assertion led many to believe that Storage DRS would not work well with Adaptive Queuing (AQ), another of VMware’s queue depth throttling mechanisms. However internally, many felt that this wasn’t a true statement, but some work was needed to verify that it would not cause any issues. This led to a number of tests being run with Storage DRS and both of our queue throttling features, SIOC and Adaptive Queuing. I am using this post to share those results.
The concern here was whether one queue throttling mechanism might in some way step on the other queue throttling mechanism. This was the purpose of the testing – run SDRS with SIOC AND AQ enabled. The testing was quite comprehensive since SIOC can run be run in both a stats only mode and full mode. So effectively there were two different scenarios to be tested:
- Scenario 1 : AQ + SDRS ( SIOC StatsOnly Mode )
- Scenario 2 : AQ + SDRS ( SIOC Full Mode )
- Add two datastores to a datastore cluster
- Enable AQ as per KB 1008113 and enable Storage DRS ( SIOC Fully Enabled and also SIOC in stats only mode)
- Run tests to ensure AQ feature works as expected when Storage DRS is enabled
- Simulate “BUSY or QUEUE FULL” condition from array to ensure AQ works as expected
- Run Storage DRS tests to ensure Storage DRS algorithm works as expected when AQ is enabled
Observations with SIOC set in stats only mode
- When congestion is introduced on the datastore (via IOmeter running in VMs), queue depth of the datastore changed from 64 to 32 (AQ halves the queue depth by design).
- When the number of QUEUE FULL or BUSY conditions were introduced and reached the QFullSampleSize value, the LUN queue depth is again reduced by half (observed queue depth of 15 and 16 during testing)
- When we manually stop “BUSY or QUEUE FULL” errors, AQ changes queue depth of the datastore back to 32 and IO to datastores will resume to normal
- When we stop/reduce I/O’s to datastore , queue depth of the datastore is restored to 64.
- Storage DRS initial placement, and migration during maintenance mode worked as expected.
Observations with SIOC set in full mode
- When both AQ and SDRS (with SIOC Full mode ) are enabled, SIOC does queue depth throttling of datastore. AQ has no effect. So, SIOC overrides AQ while SIOC is in full mode.
- When IOmeter was started to introduce congestion, the queue depth of the datastore was throttled by SIOC but not halved – it was a more granular decrease
- When we manually inject “”BUSY or QUEUE FULL” errors on the active I/O path, the queue depth of the datastore was again changed but not halved – it was once again a more granular decrease.
- When both AQ and SDRS (with SIOC in full mode) are enabled, SDRS operations ( eg. initial placements/ datastore maintainance mode) were working fine.
We’ve now removed from the “vSphere 5.5 Performance Best Practices Guide”, the statement “Queue depth throttling is not compatible with Storage DRS.”
The new version is now live at the same location as the previous one and has the same file name, but says on the cover “Revised May 14, 2014.
One other important note – it was originally thought that SIOC stats only mode would be enabled by default on datastores. This is not the case. SIOC stats only mode is disabled by default on the datastores. There is more information here on the topic.
Kudos to Jyothi Mallikarjun and Sivakumar Sreedharan Nair for sharing their test results.