Pure Storage are all over the news at the moment. They just secured another round of funding (225 million to be precise), and are now valued at over 3 billion. You can read more about that here. However, even before this announcement, I had already arranged to have a catch up chat with Pure’s primary evangelist (and a good pal of mine), Vaughn Stewart. I was surprised to see that it had been 18 months since I last did a piece on Pure so I really did want to see what changes they had made in the meantime as there were a few vSphere interoperability pieces still to be completed when we last spoke.
If you are already familiar with Pure Storage, you can skip this piece. For those of you who are not too familiar with their product, this is a quick overview. Pure Storage offer an All Flash Array (AFA) aptly named the FlashArray. Their AFA is a symmetric active/active, dual controller architecture. This means all the I/O ports are primary and active. The array offers 8 ports for connectivity. These can be 10Gb iSCSI or 8Gb Fibre Channel (or 12 ports for mixed access if you wish, but they don’t have many customers doing that). They offer modular shelf expansion, and their array ships with 512GB MLC drives. The FlashArray is what Pure calls a stateless architecture. By this they mean that the NVRAM, which acknowledges the writes before storing the data on the Solid State Drives, is located in the first two shelves of the array; it is not located in the controller/heads. This has multiple benefits: it allows for seamless software upgrades, hardware upgrades and upgrade of controllers without impacting the array. But won’t upgrading a controller impact the performance I hear you ask? Surely for an AFA running hundred of thousands of IOPS, taking a controller offline is going to negatively impact the performance. This is where Pure tweaks the Active/Active model. Pure uses only half of the controller resources during normal operations (a bit like how an Active/Passive model would work), so when you do maintenance tasks like this, there is no loss of performance. A nice feature for sure.
Pure has an adaptive IO transfer size ranging from 4K and 32K. Vaughn wanted to call this out as a particular feature of Pure Storage that makes it very suitable for virtual machine workloads. He said that an application or VM generates different I/O sizes (aka the I/O blender) and that most arrays used a small, fixed block size of 4KB. He said that if an application generates a 4K data block, they can handle that in a single I/O; similarly if the application then generated a 32K data block, they could also handle that in a single I/O whereas other storage arrays may have to break that down into 8 x 4K I/Os to handle it, and thus impact performance. He said they were working on a number of white papers that they hope to publish shortly to show the net benefits of this adaptive I/O mechanism.
Dedupe & Compression
Pure calls their data reduction technique ‘Flash Reduce‘. It has various levels of inline deduplication and compression built into their system. First off there is the 8-bit pattern removal mechanism which identifies and drops repetitive binary patterns as they hit the NVRAM.
Next is the Adaptive Data Deduplication which works at a 512 byte granularity on NVRAM contents to ensure that only unique data is stored in flash.
Then there is their Adaptive Data Compression using the LZ0 (Lempel–Ziv–Oberhumer) algorithm. In VMware environments, Vaughn stated that their customers are averaging somewhere between a 5:1 to 9:1 compression ratio.
Pure offer’s LUN clones and VM VAAI XCopy for creating new copies of data sets without copying or storing new data.
Finally, there is Pure’s Deep Reduction/Deep Compression mechanism. This is actually built into their garbage collection mechanism in FlashCare. SSD store data in cells but erase data on a per page basis, which contains multiple cells. Some of these cells may need to have their contents preserved, so those cell contents are written into another cell to allow the page to be erased. During Garbage Collection, Pure’s Deep Reduction/Deep compression mechanism will once again scan the contents to see if additional capacity reduction gains can be made.
Vaughn provides further details into these algorithms in this blog post.
QoS – Quality Of Service
I spoke to Vaughn about this as this seems to be a feature that more and more folks are interested in. We know how VMware can control very eloquently many system resources such as CPU, memory and even networking. However storage has always been a pain point, and although Storage I/O Control (SIOC) goes some way to mitigate that, it is limited by what it can do at the server-side and has little control over what happens on the array side. Pure’s QoS seems to be geared towards more about guaranteeing fairness rather than guaranteeing a particular level of performance. It doesn’t appear to provide guard rails to prevent noisy neighbours from spilling over between volumes. However Vaughn stated that Pure’s thought process was more toward enabling the performance capabilities of their arrays rather than working on limitation techniques. He added that very few of their customer can actually drive their FlashArray beyond the 90 percentile of resource usage, so that noisy neighbour issues aren’t an issue for them. He said QoS may evolve to better support service providers but at the moment Pure engineering is more focused on integrations for enterprise adoption. It may be something worth considering by Pure going forward however.
The next thing we spoke about was vSphere interoperability. There were two areas in particular that I wanted to touch on which were lacking when I last spoke to the guys are Pure – namely VAAI and SRM.
When I last spoke to Pure, they had only implemented ATS and WRITE_SAME (Zero) primitives. They had not yet done the XCOPY (Clone) or UNMAP. I’m happy to say that these primitives are now fully supported. One outstanding block primitive is the Thin Provisioning Stun/Warning capability. Vaughn stated that they are still actively working with VMware on an implementation to this one.
Again, in previous conversations, one of the missing pieces to Pure’s storage services was replication. Although not officially announced, Pure’s forthcoming Purity OE v4.0 release is rumoured to put this to rights. Well, the current v3.3 release of OE has replication in beta, so it doesn’t take a genius to figure that out. More interestingly is whether or not they will have a Storage Replication Adapter to support Site Recovery Manager. I guess we’ll just have to wait and see, but I’m sure Pure see the value in having SRM do orchestration of Disaster Recovery scenarios between Pure AFAs. Oh, and Vaughn did add that any new features that they introduce in OE v4.0 will be free to all customer with active support contracts.
It’s always a pleasure chatting with Vaughn – and I can tell he is very enthusiastic about the things that are happening at Pure Storage these days. Putting aside the new round of funding and the valuation, he told me that they had 700% Year-On-Year growth too. Pure’s mantra is that you can get an All Flash Array for the price of a disk array – OK, so that might be a bit of marketing. However Vaughn and the rest of the Pure team regularly present at various VMware User Group meetings (VMUGs) in the US & EMEA. That will be an opportunity for you to get them to prove if the mantra really is true.