PrimaryIO announce VAIO (I/O filter) for cache acceleration

Cormac

8 years ago

I got a bit of a surprise a few weeks back when I noticed a register article by Chris Mellor stating that PrimaryIO (previously CacheBox) had announced a new cache acceleration I/O filter for vSphere. We first announced plans for VAIO (vSphere APIs for I/O Filters) back at VMworld 2014. VAIO allows VMware partners to plug their products/features directly into the VM I/O Path which in turn will give our customers access to 3rd party storage services/features like deduplication, compression, replication or encryption which may not be available on their storage array. Or in this case, a cache acceleration feature. I wasn’t aware of any announcement internally at VMware, so reading it on the register came as a bit of a surprise. I know that other partners such as SanDisk and Infinio are also working on cache acceleration products. However this was the first time I heard of PrimaryIO developing a cache acceleration filter.

I reached out to Anurag Agarwal for some further information about their VAIO implementation and asked some questions about their particular I/O filter. Because the filter supports both write-thru (read) and write-back (write) caching, I was very interested in how they protected against failure.

Q1. What are the pre-reqs? (e.g. vSphere version, flash devices, number of hosts, etc)
A1. Minimum ESX version required is 6.0 U1. We support multiple flash devices per ESX server, there is no inherent limit on number of nodes, but for first release we are not testing with large number of nodes. The plan is to test with 6-8 ESXi nodes, and around 200 accelerated VMs initially. However, as more and more use cases arise, I’m sure they will be testing with a bigger scale.

Q2. What are the considerations I should keep in mind before deploying (best practices if you will)
A2. I would identify the VM with heavy IO load and configure read cache for data not changing very often and write back cache for data with good number of writes. I would configure number of mirror to 2 for write back. PrimaryIO have a profiling tool that runs in the guest that will help administrators to understand which workloads can benefit from caching, whether it should be read or write, and how much cache to allocate to a particular VMDK.

Q3. From a VMware perspective, I’d asked to see both the install/configure/setup steps and the policy/capabilities that show up in the vSphere web client when using PrimaryIO.
A3. In a webex with the PrimaryIO guys, they showed us the install steps. PrimaryIO will provide a simple web server appliance for the purposes of installation. This will push the appropriate bits both to the vCenter server and the ESXi hosts. Each ESXi host in the cluster has an agent/daemon installed, and the vCenter Server has an Application Performance Acceleration (APA) plugin installed. Once the plugin is on vCenter, individual clusters can be selected, and enabled for cache acceleration. The agent is pushed out to the hosts in the form of a VIB, and does not need the hosts to be rebooted. (We wondered why VUM wasn’t used for this, but apparently this installation method is part of the VAIO architecture).

Here are some screenshots showing the deployment mechanism. First, there is the plugin installer which points to the web server appliance, where the bits are deployed from:

And once this has been configured, the remainder of the configuration, which is basically pushing out the appropriate VIBs to the ESXi hosts, is done from the web client:

Once everything is installed, administrators can now use SPBM to create the cache policies. Capability are (a) read or write cache, (b) replica copies of the cache and (c) how much space should be allocated to the cache. The size of the cache is specified as a % of the VMDK size.

The technical questions around the solution are:

Q4. Are there any interoperability concerns with core vSphere features – DRS, HA, Storage DRS, Storage vMotion, vMotion, Fault Tolerance, etc
A4. It works with call core vSphere features, with a caveat that they have not tested with VMware fault tolerance FT) yet.

Q5. How does one choose where to replicate the cache? Is this taken care of automatically?
A5. It is taken care automatically. However there is locality, in so far as the cache is instantiated on the same host where the VM is deployed. But of course, that locality is lost if the VM is migrated to another host, or a HA event occurs. Replication is done over a VMkernel interface. During deployment, the interface to replicate on is chosen. PrimaryIO recommends using 10GbE where possible.

Q6. Does the solution require data locality? Compute on same host as cache? Is performance better if that is the case?
A6. As mentioned previously, data locality is available initially, but may not persist. The guys at PrimaryIO state that it is better to cache on the same host as you save on network resources. But even if the VM has to go back and fetch from a remote cache, it is still far better than going to back-end storage.

Here is an architectural diagram that the team shared with me which helps clarify some of the above Q&A:

Q7. In the event of a failure, how do you control where the VM is restarted, i.e. the host with the replicated cache? Is there some interaction with HA, affinity groups needed?
A7. We don’t control where VM is started, for each vmdk we know the nodes having those cache, we continue to cache data on those SSDs whether those are local or remote.

Q8. In the event of a failure, does the cache that was replicated gets resynced elsewhere?
A8. At this point we don’t start new replication. But once the failed node comes back we resync the cache to bring it up to date. We are making the assumption that node failures are more common compared to device failure and most often failed nodes come back. Another item to note is that if write cache is being used, and a failure means that there is only one copy of the cache left, APA will switch the cache to a read cache to avoid another failure causing data loss/corruption issues.

Q9. Can cache be replicated in more than one place?
A9. Yes, we support replicating up to 2 additional replicas.

Q10. On reading the overview of PrimaryIO I/O filter, it looks like hosts without SSDs/flash can leverage acceleration over the network? Is that correct? Is there a network requirement? 10Gb? Is there a significant performance drop?
A10. Yes, that is correct. There is no specific requirement of 10G network, but in heavy load scenario 10G will be a better config.

Q11. Do you have VVol support?
A11. Not at this time, but it is something they are working towards.

Q12 Can the cache hit rate be monitored?
A12. Yes, there is a purpose built UI to provide the administrator into how well the cache is performing. Here is a sample dashboard:

I’m happy to see some filters now starting to become available. This is yet another part of the SDS vision, where individual data services that are unavailable either on HCI or storage arrays, can now be leveraged by VMware customers directly from a third party. PrimaryIO have given us access to the I/O filter, and I hope to get some hands-on in the very near future and test some workloads to see how well it performs/improves things.

I didn’t get any pricing and packaging information in time for the post, but I have asked for some details around this and I will share it you as soon as I have it.

Find out more about PrimaryIO and APA here.