SolidFire demo: SIOC & QoS Interoperability
I watched a very cool demonstration this morning from the All Flash Array vendor, SolidFire. I spoke with SolidFire at the end of last year, and did a blog post about them here. One of the most interesting parts of our conversation last year was how SolidFire’s QoS feature and VMware’s Storage I/O Control (SIOC) feature could inter-operate. In a nutshell, QoS work at the datastore/volume layer whereas SIOC deals with the VM/VMDK layer. Last week, Aaron Delp and Adam Carter of SolidFire did an introduction to QoS, both on vSphere and on the SolidFire system. And they also did one of the coolest demos that I’d seen in some time, namely how they have managed to get SIOC and QoS to work in tandem.
The initial part of the demo spoke about solving the age-old problem of noisy neighbours. They discussed the SIOC approach where it tries to achieve fairness between virtual machines running on the same shared datastore. They also highlighted areas where SIOC may not be able help. In particular, when the storage is used by applications that are not known to vSphere (external workloads). Also, the latency congestion threshold that triggers SIOC to start throttling is too high for flash storage.
To address the above, SolidFire have a plugin to the vSphere client which allowed them to create a relationship between SIOC shares on the VM and QoS levels on the volumes. Keep in mind that QoS is related to datastores and SIOC is related to VMs. What SolidFire have done is that they have used the SIOC settings on the VMs to automatically dictate the QoS settings on the underlying volume to meet the correct level of IOPS. SIOC shares, set on a per VMDK basis, are used to define a minimum number of 4K IOPs, and the SIOC Limit IOPS setting, also set on a per VMDK basis, is used to define the maximum number of 4K IOPS. Burst, since there is no SIOC burst setting, is defaulted to be 4 times the Limit IOPS setting. Here is a screenshot showing the SolidFire plugin on the vSphere web client enabling SIOC-QoS:
Congestion threshold is also set to 5ms on the datastore since 30ms is too high. This is the minimum the system will allow.
Now the most interesting part of the demo was the migration of a virtual machine from datastore A to datastore B. The QoS settings on the volumes automatically adjusted as the VM was Storage vMotion’ed. The source datastore, which now has one less VM, adjusted QoS downwards, and the destination datastore, which now has an additional VM, adjusted QoS upwards. This enabled the underlying volume of the destination datastore to meet the performance requirement of the existing virtual machines and the newly arrived virtual machine without any of the VMs taking storage resources from one another. Basically storage performance now follows the VM around the infrastructure. This is a superb enhancement in my opinion.
You can catch the whole demo by clicking here. The first 19 minutes is an introduction to QoS in general, including SIOC. From 19m onwards there is a discussion about the impact of noisy neighbours, etc. The SIOC demo begins in earnest at 27m, and the Q&A, which has some really good questions starts at 38m.
Great job guys.
Disclosure: I work for SolidFire and assisted with the demo.
Thanks for the great write up Cormac! Your article really sums up the intent of our demo very well. You will see much tighter integrations between SolidFire and VMware throughout the rest of this year and I’m really excited about where we are going together.
The other item I would add that jumped out at folks on the call was the ability to separate performance from capacity using QoS at the volume level. We call this feature Performance Virtualization. I’m not crazy about the name (long story) but in a nutshell you are able to take a volume of any size and then assign minimum, maximum, and burst IOPs to that volume regardless of capacity. Need a really fast small LUN for a critical tier one app (guarantee minimum) IOPS? We can do that. Need a larger LUN with less performance for a few less critical apps (guarantee maximum)? We can do that.
Our sweet spot is any environment where there is a need to run hundreds, or even thousands, of applications or tenants on the same storage and guarantee performance for everyone.
As far as I can tell it is fairly unique in our industry right now.
Thanks for that Aaron.
I think you guys have got some really nice features there, and I think that this will be extremely well received when we finally get around to releasing Virtual Volumes (VVols). Having QoS like this on a per VVol (VMDK) basis will allow customers to virtualize even more business critical applications in my opinion.
Glad I stumbled upon this Blog, we’re a partner with SolidFire and currently utilize a few SF nodes in our DC. This would be a nice add-on later on down the road for us. Thanks for the great write-up Cormac.
Glad you found it useful.