PernixData revisited – a chat with Frank Denneman
I’m sure Frank Denneman will need no introduction to many of you reading this article. Frank & I both worked in the technical marketing organization at VMware, before Frank moved on to PernixData last year and I moved to Integration Engineering here at VMware. PernixData FVP 1.0 released last year, and I did a short post on them here. I’d seen a number of people discussing new FVP features in the community, especially after PernixData’s co-founder Satyam’s presentation at Tech Field Day 5 (#TFD5). I decided to reach out to Frank, and see if he could spare some time to revisit some of the new features that PernixData is planning to introduce. Fortunately, he did. I started by asking Frank about how PernixData is doing in general, before moving onto the new bits.
Pernix Data Overview
Frank told me that PernixData is now operating in 3 different regions; US/Canada, EMEA & Asia. Here in EMEA, there are now folks on the ground in the UK, the Nordics and the Netherlands. They also have a presence in Japan and Australia and customer numbers are now in the triple digits. Frank is just back from the USA where he did a whistle-stop tour of a number of US cities, and VMUGs. He told me that the number of potential customers who have an interest in what they do continues to grow.
So what exactly do PernixData do? In a nutshell, Pernix are there to “accelerate your data”. How do they do that? Well, up until now, this has been achieved by giving flash resources (SSD or PCIe) to your virtual machines. Where PernixData is unique is that they can accelerate both your read and write I/O (typically referred to as write-through and write-back) through software which is integrated directly into the VMkernel. Because it is so tightly integrated with the VMkernel, Frank states that there are no additional configuration steps to be done; there are no additional agents or drivers to be added to the VM to accelerate it.
The software itself comes in the form of a VIB (vSphere Installation Bundle), which is signed by VMware, so it can be installed by (VUM) VMware Update Manager. The only other component required is the PernixData Management Server which needs to be installed on a Windows OS. The information gathered from the Management Server is then used to feed the vSphere client with specific FVP information. Many customer simply install the Management Server in their vCenter Server host/VM. It should be noted that the Management Server is not part of the data path; if it fails, it has no impact in the operation of the running virtual machines that are already consuming flash resources via FVP.
The first question that many of you will have is regarding the write-back mechanism. How can PernixData FVP protect against data loss if a host fails, and that host has VM using write-back cache? As a recap, write-back is where the write is sent to flash, but the write is ACK’ed before it is sent to persistent storage. Obviously, if there was a failure before the data is sent to persistent storage, you can be in a data loss situation. PernixData address this through the use of replica hosts which store redundant write data. PernixData FVP allows for the configuration of a number of different replica hosts (0,1 or 2 depending on your requirement). In the event of a host failure, any VMs on that host which need to be restarted will be synchronized against the redundant write data on a replica host. PernixData have a lot of built-in smarts depending on the failure; for example, Frank told me that the caching policy can change on the fly if errors such as host failures, flash failures, etc, occur. The policy will automatically switch to write-through from write-back if there are not enough replica hosts in the cluster. This means that data will have to be written to persisted storage before an ACK is sent back if there are no available replicas. Also, FVP has the ability to choose new replica hosts if the original replica host has an issue.
vSphere interop
A virtual machine accelerated by PernixData FVP has full interoperability with vSphere features like vMotion, Storage vMotion, Storage DRS, HA, DRS & VAAI. It is also supported with Site Recovery Manager (SRM) for DR orchestration, but in this case Frank advised that customer should consider using the write-through cache policy to write to persistent storage if the RPO is zero, so that the data can be persistent (and also replicated to the DR site) before being acknowledged. This way, customers can be confident that the data on the recovery site is in a known persistent state.
One other call-out from Frank was related to back up and restore. He said that customers who used 3rd party backup products that relied on vSphere snapshots as part of their backup mechanism should also switch to write-through mode for the period of the backup. Frank said that many customers scripted a pre-task to do this via their backup product, and it works quite well. Afterwards, a post-task script can switch the policy back to write-back.
One final query was regarding Storage DRS and Storage I/O Control. Frank stated that initial placement and balancing based on capacity worked exactly the same as before, but he stated that the minimum 5 msec latency threshold for balancing based on I/O metrics was simply too high for them. This is the same feedback I have heard from the All Flash Array (AFA) vendors. Hopefully we can do something about that soon.
At this time, vSphere Fault Tolerance is not supported with FVP.
The vSphere web client plugin from PernixData performs very nicely, and gives some really interesting performance views. FVP 1.5, introduced a few months ago, introduced per VM FVP Usage and FVP Performance tabs for IOPS, latency, throughput, hit rate & eviction rate. This gives a good feel for what’s happening in the cluster. Also of note is the Cluster Summary Screen. From here, under Performance > Historical > IOs Saved from Datastore, you can see how I/O acceleration via Pernix Data FVP has benefited your storage infrastructure.
FVP.next
So next we moved onto the future – what coming from PernixData very shortly. In no specific order, this is what Frank shared with me:
1. Support for NFS
Up until now, FVP was only supported on block storage. This next release will introduce support for NFS. And once again, it is a simply a matter of selecting the VM on the NFS datastore. There are no additional agents to be installed, or anything like that.
2. DTFM – Distributed Fault Tolerant Memory
I mentioned previously that PernixData FVP accelerated virtual machines using flash from SSD and/or PCIe. In their next release, they introduce another acceleration resource: memory! As Frank says, this opens their product to a whole new set of customers as you won’t need to add new hardware like SSD & PCIe. Instead, you can get the “insane amount of IOPS and low latency” using host memory, which everybody already has in their data centers.
There are a couple of requirements and restrictions however. Hosts must contribute a minimum of 4GB of memory and they can contribute up to 1TB maximum per host. Hosts in the FVP cluster contributes either memory OR flash, they cannot contribute both acceleration resource types. VMs will typically not consume both flash and memory for acceleration – the VMs will get the acceleration resource from the host they are deployed on. However if a VM is migrated to another host in the cluster which contributes a different resource type, it is possible for a VM to leverage both types, since it will reach back to its original cache on the original host for a data block if required.
I asked Frank how a customer could guarantee that a VM would only consume only a single resource type. If this was a concern and you wanted to ensure that a VM used only one particular acceleration type, you could set up multiple FVP clusters on your vSphere cluster, each with different cache resource types, and migrate the VM between different FVP clusters. This would invalidate the previous flash footprint and the VM starts over “rewarming” using only the acceleration resource on the new FVP cluster.
I also asked Frank about support for the new Memory Channel Attached Storage (MCS), which allows a flash device to be installed in the DIMM slots of a host. Frank couldn’t actually say anything about that yet. However, since vSphere supports it (see my blog post on the announcement here) my guess is that it may be only a matter of time. Probably the biggest drawback right now is that there is only a single host (IBM X6) on the VMware HCL which supports this new technology, so we may need to see this technology get more market penetration first.
3. Adaptive Network Compression
By default, the vMotion network is used to carry redundant write data traffic and do remote flash access for FVP, but other VMkernel networks can be configured and used instead of the vMotion network.
An obvious issue for products that need to replicate writes over a network is network bandwidth. The bandwidth of network can become a bottleneck since data blocks need to be committed to a remote flash device as well. The bigger the bandwidth, the more this can help with IOPS. PernixData is introducing another new feature called Adaptive Network Compression to help with this. What PernixData now do is that they will look at I/O in real-time and then do a Cost Benefit Analysis to see if it worth compressing. You might be concerned with the CPU overhead doing this CBA, and then the compression itself. Frank said that this is not an extremely intensive operation, and their testing shows that it only uses about 2% of extra CPU. Because the feature is adaptive, it does not compress every block; Pernix can decide which block to compress. The feature is on by default. You cannot switch it off.
The other nice thing about this is that data is kept compressed on the replica hosts’ flash, as it is rarely needed. It is only uncompressed when there is a failure and the cache contents on a replica is required by a VM. While Pernix have recommended 10Gb for the host interconnect up until now, 1Gb interconnects now become very interesting with the new adaptive compression feature.
4. Failure Domains
This is another nice feature in the forthcoming release. PernixData is introducing the concept of Replica Groups. Now you can decide where your replicas are placed, giving you the concept of fault domains. Consider 2 racks of servers, all running FVP. This feature now allows you to run a VM with its main cache in rack 1, and at the same time place the replica in rack 2. Now if rack 1 fails (power trip for example), you have the replica copy available in rack 2, and the VM can recover gracefully, possibly with the assistance of vSphere HA. Nice job.
One other failure domain feature is if a host fails in the replica group, the management server will select another peer host within that replica group to take its place.
Conclusion
It was great to catch up with Frank. I know he is extremely busy evangelizing the PernixData story around the world, so I appreciate the time he took to get me up to speed on what’s happening in their world. I also hope to get to the next Ireland VMUG in Dublin on May 22nd to meet with him in person. If any of my readers based in Ireland are free on May 22nd, I urge you to get along to the VMUG and learn some more. PernixData have another real interesting set of features as they march on towards their “server-side storage intelligence platform” vision.
Thanks for this update. Very interesting to read. I have a question – As you mentioned it’s a software integrated into kernel, which suggests that it’s a host based acceleration software. Does that mean, it will wrok across any storage vendor [as all the acceleration is done @host side]. Also, in order to make use of this software, do I need to install PCIe read/write Flash on the each vShere Host in the cluster?
Also, does installting the pernix – requires vSphere host reboot?
I was reading about EMC XtremSF PCIe flash cache, does pernix work with this?
You also talked about ‘pernix supports full read and write (write through/write back) acceleration. Will write through not induce latency when compared with writeback.So, considering it acknowleges write only after writing to disk, how does writetrhough work with the storage arrays [NetApp] that anyway do write back using NVRAM.
Thanks for this post.
-Ashwin
Yes, it should be storage agnostic, and yes, hosts will require a flash device.
No, no reboot according to http://www.pernixdata.com/files/pdf/PernixData_DataSheet_FVP.pdf
Don’t know about EMC XtremSF PCIe – you would need to as PernixData those specific questions. Same for your NetApp query.
Thanks Cornmac. I appreciate your time.