Does Storage DRS work with Adaptive Queuing?

Many of you will be aware that Storage DRS uses Storage I/O Control (SIOC) for load balancing based on I/O metrics. However a statement in one of our white papers has raised a few questions recently with both our customers and partners. The statement is as follows:

“Queue depth throttling is not compatible with Storage DRS”. (pg.34) from  http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf.

This assertion led many to believe that Storage DRS would not work well with Adaptive Queuing (AQ), another of VMware’s queue depth throttling mechanisms. However internally, many felt that this wasn’t a true statement, but some work was needed to verify that it would not cause any issues. This led to a number of tests being run with Storage DRS and both of our queue throttling features, SIOC and Adaptive Queuing. I am using this post to share those results.

The concern here was whether one queue throttling mechanism might in some way step on the other queue throttling mechanism. This was the purpose of the testing – run SDRS with SIOC AND AQ enabled. The testing was quite comprehensive since SIOC can run be run in both a stats only mode and full mode. So effectively there were two different scenarios to be tested:

  • Scenario 1 :  AQ + SDRS ( SIOC StatsOnly Mode )
  • Scenario 2 :  AQ + SDRS ( SIOC Full Mode )

You can read more about StatsOnly mode here. There is also a further description on SIOC and Adaptive queuing here. The test scenario was as follows:

  1. Add two datastores to a datastore cluster
  2. Enable AQ as per KB 1008113 and enable Storage DRS ( SIOC Fully Enabled and also SIOC in stats only mode)
  3. Run tests to ensure AQ feature works as expected when Storage DRS is enabled
    • Simulate “BUSY or QUEUE FULL” condition from array to  ensure AQ works as expected
  4. Run Storage DRS tests to ensure Storage DRS algorithm works as expected when AQ is enabled

Observations with SIOC set in stats only mode

  1. When congestion is introduced on the datastore (via IOmeter running in VMs), queue depth of the datastore changed from 64 to 32 (AQ halves the queue depth by design).
  2. When the number of QUEUE FULL or BUSY conditions were introduced and reached the QFullSampleSize value, the LUN queue depth is again reduced by half (observed queue depth of 15 and 16 during testing)
  3. When we manually stop “BUSY or QUEUE FULL” errors, AQ changes queue depth of the datastore back to 32 and IO to datastores will resume to normal
  4. When we stop/reduce I/O’s to datastore , queue depth of the datastore is restored to 64.
  5. Storage DRS initial placement, and migration during maintenance mode worked as expected.

Observations with SIOC set in full mode

  1. When both AQ and SDRS (with SIOC Full mode ) are enabled, SIOC does queue depth throttling of datastore. AQ has no effect. So, SIOC overrides AQ while SIOC is in full mode.
  2. When IOmeter was started to introduce congestion, the queue depth of the datastore was throttled by SIOC but not halved – it was a more granular decrease
  3. When we manually inject “”BUSY or QUEUE FULL” errors on the active I/O path, the queue depth of the datastore was again changed  but not halved – it was once again a more granular decrease.
  4. When both AQ and SDRS (with SIOC in full mode) are enabled, SDRS operations ( eg. initial placements/ datastore maintainance mode) were working fine.

We’ve now removed from the “vSphere 5.5 Performance Best Practices Guide”, the statement “Queue depth throttling is not compatible with Storage DRS.”

The new version is now live at the same location as the previous one and has the same file name, but says on the cover “Revised May 14, 2014.

One other important note – it was originally thought that SIOC stats only mode would be enabled by default on datastores. This is not the case. SIOC stats only mode is disabled by default on the datastores. There is more information here on the topic.

Kudos to Jyothi Mallikarjun and Sivakumar Sreedharan Nair for sharing their test results.

10 Replies to “Does Storage DRS work with Adaptive Queuing?”

  1. I appreciate the testing; this is one of those things I was mildly curious about but never took the time to dig deeper on. Cheers!

    1. I still can’t believe the performance team put that statement in there. Because I wrote a bunch of code in SIOC from the very first version of the product to make sure that SIOC would properly interoperate with Adaptive Queuing. We even had test cases for this. This is back in 2009.

      Sometimes the right hand really doesn’t know what the left hand is doing 🙁

      And these myths are so hard to quell after they are formed.

Comments are closed.