I reached out to Anurag Agarwal for some further information about their VAIO implementation and asked some questions about their particular I/O filter. Because the filter supports both write-thru (read) and write-back (write) caching, I was very interested in how they protected against failure.
- Q1. What are the pre-reqs? (e.g. vSphere version, flash devices, number of hosts, etc)
- A1. Minimum ESX version required is 6.0 U1. We support multiple flash devices per ESX server, there is no inherent limit on number of nodes, but for first release we are not testing with large number of nodes. The plan is to test with 6-8 ESXi nodes, and around 200 accelerated VMs initially. However, as more and more use cases arise, I’m sure they will be testing with a bigger scale.
- Q2. What are the considerations I should keep in mind before deploying (best practices if you will)
- A2. I would identify the VM with heavy IO load and configure read cache for data not changing very often and write back cache for data with good number of writes. I would configure number of mirror to 2 for write back. PrimaryIO have a profiling tool that runs in the guest that will help administrators to understand which workloads can benefit from caching, whether it should be read or write, and how much cache to allocate to a particular VMDK.
- Q3. From a VMware perspective, I’d asked to see both the install/configure/setup steps and the policy/capabilities that show up in the vSphere web client when using PrimaryIO.
- A3. In a webex with the PrimaryIO guys, they showed us the install steps. PrimaryIO will provide a simple web server appliance for the purposes of installation. This will push the appropriate bits both to the vCenter server and the ESXi hosts. Each ESXi host in the cluster has an agent/daemon installed, and the vCenter Server has an Application Performance Acceleration (APA) plugin installed. Once the plugin is on vCenter, individual clusters can be selected, and enabled for cache acceleration. The agent is pushed out to the hosts in the form of a VIB, and does not need the hosts to be rebooted. (We wondered why VUM wasn’t used for this, but apparently this installation method is part of the VAIO architecture).
Here are some screenshots showing the deployment mechanism. First, there is the plugin installer which points to the web server appliance, where the bits are deployed from:
- Q4. Are there any interoperability concerns with core vSphere features – DRS, HA, Storage DRS, Storage vMotion, vMotion, Fault Tolerance, etc
- A4. It works with call core vSphere features, with a caveat that they have not tested with VMware fault tolerance FT) yet.
- Q5. How does one choose where to replicate the cache? Is this taken care of automatically?
- A5. It is taken care automatically. However there is locality, in so far as the cache is instantiated on the same host where the VM is deployed. But of course, that locality is lost if the VM is migrated to another host, or a HA event occurs. Replication is done over a VMkernel interface. During deployment, the interface to replicate on is chosen. PrimaryIO recommends using 10GbE where possible.
- Q6. Does the solution require data locality? Compute on same host as cache? Is performance better if that is the case?
- A6. As mentioned previously, data locality is available initially, but may not persist. The guys at PrimaryIO state that it is better to cache on the same host as you save on network resources. But even if the VM has to go back and fetch from a remote cache, it is still far better than going to back-end storage.
Here is an architectural diagram that the team shared with me which helps clarify some of the above Q&A:
- Q7. In the event of a failure, how do you control where the VM is restarted, i.e. the host with the replicated cache? Is there some interaction with HA, affinity groups needed?
- A7. We don’t control where VM is started, for each vmdk we know the nodes having those cache, we continue to cache data on those SSDs whether those are local or remote.
- Q8. In the event of a failure, does the cache that was replicated gets resynced elsewhere?
- A8. At this point we don’t start new replication. But once the failed node comes back we resync the cache to bring it up to date. We are making the assumption that node failures are more common compared to device failure and most often failed nodes come back. Another item to note is that if write cache is being used, and a failure means that there is only one copy of the cache left, APA will switch the cache to a read cache to avoid another failure causing data loss/corruption issues.
- Q9. Can cache be replicated in more than one place?
- A9. Yes, we support replicating up to 2 additional replicas.
- Q10. On reading the overview of PrimaryIO I/O filter, it looks like hosts without SSDs/flash can leverage acceleration over the network? Is that correct? Is there a network requirement? 10Gb? Is there a significant performance drop?
- A10. Yes, that is correct. There is no specific requirement of 10G network, but in heavy load scenario 10G will be a better config.
- Q11. Do you have VVol support?
- A11. Not at this time, but it is something they are working towards.
- Q12 Can the cache hit rate be monitored?
- A12. Yes, there is a purpose built UI to provide the administrator into how well the cache is performing. Here is a sample dashboard:
I didn’t get any pricing and packaging information in time for the post, but I have asked for some details around this and I will share it you as soon as I have it.
Find out more about PrimaryIO and APA here.