All Flash Arrays continue to make the news. Whether it is EMC’s XtremIO launch or Violin Memory’s current market woes, there is no doubt that AFAs continue to generate a lot of interest. Those of you interested in flash storage will not need an introduction to SolidFire. These guys were founded by Dave Wright (ex-RackSpace) and have been around since 2009. I have been trying to catch up with SolidFire for sometime as I’d heard their pitch around Quality of Service on a per volume basis and wanted to learn more, especially how it integrated with vSphere features. Recently I caught up with Dave Cahill and Adam Carter of SolidFire to have a chat about SolidFire in general and what the VMware integration points are.
I mentioned that the SolidFire founder, Dave Wright, came from RackSpace. Whilst at RackSpace, they developed a reliable, automated and efficient system for scaling out compute. However, they could not find a suitable automated and efficient storage system that would deliver linear performance in a multi-tenant environment. This led to the formation of SolidFire.
Scale & Performance
SolidFire will be the first to admit that they are not there to simply throw out hero numbers from their arrays – they believe that there are already enough players in that market, and that it is quite niche. Instead, they are positioning themselves as a predictive and persistent performance storage solution. Their arrays will still generate a lot of IOPS, but will also scale out in a linear fashion one node at a time, to a maximum of 100 nodes. Their biggest deployment to date has been 25 nodes.
SolidFire currently ship 3 models of storage system with a different number of nodes and a different size of SSD (SF3010, SF6010, SF9010 with 300GB, 600GB and 960GB SSDs respectively). Each node is a self contained appliance built on top of DELL hardware. Currently the only supported storage protocol is iSCSI over 10GbE, but Fiber Channel is top priority as a supported protocol for FY2014. The base unit is a 5 node cluster – this is also the minimum number of nodes in the cluster. Today, you cannot mix different nodes – they must all be the same model. However a future enhancement is to allow the mixing and matching of different node types. A basic system of 5 nodes ships with 60TB of SSD and provide 250,000 IOPS. However, to return to SolidFire’s pitch, they don’t want you to focus on the IOPS numbers but rather on the storage issues that they can fix for you. Using their QoS feature, SolidFire volumes can guarantee a specific number of IOPS per volume, and provide virtual guardrails to ensure that the performance of one volume is not impacted by the behaviour of another volume in the same cluster. More about this later.
Dedupe, Automation and Data Protection Features
Each SolidFire array comes with compression and deduplication. This is inline and is cluster wide. Automation can be achieved through a RESTful API, but of course there is also a UI interface to the cluster. When a LUN is created on the SolidFire array, it is evenly distributed across all nodes in the cluster. As nodes are added and removed to the cluster, there is automated redistribution of the data. SolidFire also has RAIDless data protection as they feel that RAID is not an option because of rebuild times and SSD endurance. Through their ‘Helix’ mechanism, two copies of the data are distributed across all nodes in the cluster which allows the cluster to survive multi-node failures.
Quality Of Service – QoS
Returning to the QoS feature mentioned earlier, SolidFire allows you to specify a number of IOPS per volume. Therefore if you know the I/O requirements of the applications running in your virtual machines, you total the IOPS requirements of all the virtual machines running on that datastore and set the QoS IOPS requirement appropriately on the volume. SolidFire can will then guarantee that this number of IOPS will always be provided by this volume/datastore. This is one of the unique features of the SolidFire storage system.
SolidFire have started down the road of implementing vSphere Storage APIs for Array Integration (VAAI) primitives. Right now they have focused on the efficiency primitives rather than the performance related ones. The currently have Write_Same (Zero) & Thin Provisioning UNMAP. In FY2014, they will be adding the performance primitives XCOPY (Clone) and ATS (Atomic Test & Set).
Disaster Recovery (DR)
SolidFire do not have a replication feature right now. In FY2014, they plan to introduce an asynchronous replication feature. Once that functionality is complete, they then plan to look at integration with VMware’s Site Recovery Manager (SRM) and develop their own SRA (Storage Replication Adapter).
Of course, VMware customers can use vSphere Replication with SRM orchestration to develop their own DR plans on top of SolidFire arrays right now. This is an approach taken by many array vendors that do not have native array replication technology today.
SolidFire did want to highlight that their planned native replication technology will be able to leverage their native Thin Provisioning, compression and dedupe technologies which means that the least amount of data possible will need to be replicated. However, due to the performance of these cluster (remember 250,000 IOPS with a minimum configuration), customers will still need to ensure that there is an adequate pipe between clusters to get a decent Recovery Point Objective (RPO) for their Disaster Recovery (DR).
Storage DRS & SIOC
One might well ask the question ‘if the array is performing the QoS’, is there a need for vSphere technologies like Storage DRS and Storage I/O Control’? The answer is yes, Storage DRS & Storage I/O Control still have a role to play.
Storage DRS is still ideal for initial placement and on-going load balancing based on capacity. However, the latency thresholds for the metrics based balancing are probably too high for SolidFire. Ideally, these thresholds need to be much, much lower for AFAs (5ms or less rather than the 30ms we have now). This is something we’ve heard from other all flash array vendors too. For this reason, SolidFire are recommending that customers use Storage DRS for balancing based on capacity usage. But for performance related balancing, SolidFire are recommending to their customers that they should use SolidFire QoS.
What about SIOC then? SolidFire QoS make the LUN persistent and perform consistently. SIOC takes care of the VM behaviour. In a correctly sized environment, then SIOC won’t ever need to do anything (this is true in non-flash arrays as well as in AFAs). However, even if a SolidFire volume/datastore has been tuned to provide a specific number of IOPS, customers may still over provision virtual machines on that datastore. This may lead to a ‘noisy neighbour’ problem where a single application running in a single virtual machine on a single host may begin to take more than its fair share of IOPS from that datastore. The great thing with SolidFire is that the virtual guard rails will stop this LUN from impacting any other LUNs on the system (something we’ve seen in the past) – it isolates the noisy neighbour to one datastore. Should the datastore be incorrectly over-committed, or not have an inappropriate QoS setting, then SIOC could once again do its thing – use share values to determine how much I/O a particular VM is allowed (assuming the latency value threshold is reached, which again may not be the case on flash arrays). Now SolidFire will argue that what should happen here is that the QoS value should be modified on the volume to address this issue rather than rely on a hypervizor based mechanism – happily, joint vSphere/SolidFire customers have the choice of different approaches.
Which leads us nicely to one of the future VMware projects – Virtual Volumes or VVols. SolidFire happily discuss this upcoming feature as they feel it plays right into their QoS mechanism. VVols, for those who don’t know, is a way of making virtual machine disks (VMDKs) first class citizens in the storage world. This will allow us to scale out our current storage offerings as well as given much greater granular control over the VMDKs from the point of view of snapshots and replication. I did a tech preview of VVols here. Remember – VVols is not yet a shipping product/feature. However many storage vendors are working with us on this project. SolidFire have been showing off their own implementation of VVols too – they can tie their QoS feature directly to a VMDK via VVols, meaning they can guarantee QoS for a particular VMDK thru VVols. You can see the full video of their demo here: