A closer look at Cohesity 4.0

Last week, I had a chance to catch up with my pal, Rawlinson Rivera. Rawlinson and I worked closely on a lot of storage related stuff at VMware, but he has since moved on to pastures new, and is currently the CTO for the Global Field over at Cohesity. I’ve written about Cohesity a number of times on this blog. I think the first time I wrote about them was during VMworld 2015, just before the 1.0 product launched, and they were still pitching the idea of secondary storage and how they would take care of things like snaps, clones, backups, etc. At the time, they also mentioned how they would use their analytics tooling to (a) figure out where is the dead/unused data and (b) project future storage requirements. This initial release only had NFS support for presenting data back to vSphere. In February 2016, I got a briefing on their 2.0 product. This had a major set of improvements, such as site to site replication, support for SMB file shares as well as NFS (mapping drives back to VMs), increased security via AES 256-bit FIPS-compatible encryption, the ability to spin up previous Point-In-Time copies as well as the ability to archive to the cloud (Azure, S3, Glacier). At VMworld 2016, I saw some of the cool stuff they were doing in 3.0. In particular, the integration with Pure Storage, and how they could replicate any snapshots from a Pure array, and then tell the array that it no longer needs to maintain snapshot. So I was very interested to hear about what these guys had done in 4.0.

Here is the list of new features that Cohesity has included in 4.0.

S3 compatible object store
This allows Cohesity to be an S3 bucket/target for cloud applications. For example, Rawlinson mentioned that he demos this by showing how you can archive from one Cohesity cluster to another one. So now Cohesity support NFS, SMB or S3.
NAS Data Protection
This is similar to how they Cohesity do block level protection on Pure Storage arrays.  NAS filers  can now have their snaps replicated to Cohesity, and the NAS array no longer needs to maintain them. Although aimed at NAS filers in general, NetApp is the only one certified at present.
Role Based Access  
This is really about providing a more granular  access control. One can now have different profiles for administrators, operators or end-users. These profiles can then be associated with different data views on the Cohesity cluster.
vCenter Integration
This new feature bubbles up more of the vCenter objects in Cohesity’s UI, which now allows admins/operators to Auto-Protect VMs based on folders, or tags. Nice usability feature.
Erasure coding
This is not something that is user facing, but it means that there is less capacity consumed to protect contents at the back-end. Cohesity have implement a 3+1 (presumably RAID-5) and a 5+2 (presumably a RAID-6). Erasure coding can once more be enabled on per “Data View” basis so it does not have to be enabled cluster wide.


General Q&A
Through-out conversation, I asked Rawlinson a bunch of other questions about the Cohesity platform. The answers are slightly paraphrased below.

Q. You’ve got these features called Cloud Archive, Cloud Tier and Cloud Replicate. What’s the difference between them? What do they do?
A. Cloud Archive is basically archiving to the cloud (S3 compatible). All you have to do for this is to select it via a policy framework. Cloud Tier is where hot data stays on-prem and old data gets moved to the cloud. Cloud Replicate is replicating data between different Cohesity Data Platforms. The replication is asynchronous, but it allows us to do cool use case such as DR to the Cloud and Test&Dev in the Cloud. In this configuration, Cloud VMs start-up “in the cloud” to consume the new changed blocks of protected VMs via CBT. When finished, we shut down the Cloud VMs to reduce costs (efficient transfer).

Q. Any update to the supported hardware?
A. Yes. If you buy direct from Cohesity, the nodes are built on Intel based commodity hardware. However now you can source Cohesity from Cisco on Cisco hardware. Only the software comes from Cohesity. Similarly, HPE sell Cohesity pre-installed on HP hardware. Again, only the software comes from Cohesity.

Q. Can you guys do scale-up as well as scale out? In other words, can additional storage be added to the nodes?
A. The Cohesity appliances are already maxed out on disk.  We don’t have an add disk approach. Customers scale up by adding additional nodes. Whilst Cohesity primarily scales out through the addition of fixed-config nodes, we can also scale-up by selecting higher-performance nodes that have additional processing power, or ultra-dense nodes that offer lower cost/TB.

Q. How many nodes can you scale to now? I think it the very early days, it was limited to 16 nodes.
A. We do “web-scale” so no limit. (Once the laughter died down, Rawlinson told me that he personally had been to customer which had a 32 node deployment, but the customer was not a reference so he could not name them).

Q. Do you support physical (non-VM) backups?
A. Yes. We support both virtual (VM) and physical host backups, as well as certain databases (either physical or virtual).

Q. Tell me some more about the encryption feature?
A. We do both DARE (Data at Rest Encryption) and in-flight encryption. Each Cohesity cluster has its own set of keys.

Q. How do you handle encrypted data when you are archive / replicating to the cloud?
A. When data is replicated to a remote Cohesity cluster (physical on-prem or virtual in the cloud), the data is decrypted locally and then encrypted on the wire. Once the data lands on the source, it is then encrypted on the remote cluster with its own data key. When working with a cloud archive, the archive is also encrypted with a different key. Cohesity also has the ability to export the keys, which could then be imported during a recovery scenario, i.e. if the on-prem environment is lost and a restore has to be done from the cloud archive to a new on-prem cluster.

Q. Talk me through a restore process for VMs? What about restoring a VM without vCenter?
A. Restores are about putting back in place a copy of the VM, where ever you choose. It can be to the same location as the source VM, or to a completely different location. There is no reliance on vCenter, Cohesity can restore directly to an ESXi host.

Q. What about backing up/restoring databases/tables granularity, or is it just VMs/snapshots?
A. For MS SQL Server, we offer transaction-level, point-in-time recoveries natively through our UI.  For Oracle, you can Oracle tools (e.g. RMAN) to provide transaction-level recoveries.

Q. Which vSphere versions are supported?
A. We support for vSphere 4 all the way to the latest version.

Q. Does a restore maintain all of the policies that were put in place for that VM? Or do you have to create them all again?
A. If you are asking about VM Storage Policies, then no, we do not restore those. However any profiles associated with the VM from a Cohesity perspective (backup schedule, etc) are automatically in place for the restored VM.

Q. What differentiates you from the competition?
A. The big thing in our favour is that we are not a “rip-and-replace”. If a customer is already heavily invested in another backup vendor, and they need to be able to recover data from tapes that have been written with this vendors format, customers can keep using these products, but use the Cohesity data platform for deduplication, encryption, erasure coding, etc, and not use the Cohesity DataProtect backup feature. We approach licensing differently if customers wish to go with this approach.

Q. Do you guys support VVols?
A. No. VVols are not supported at this time. To support this, we would need to make significant investment in a VASA provider, and so on. Other things have priority at the moment.

It was really good to catch up with Rawlinson and find out about Cohesity 4.0. If you want further information, Rawlinson has a good write up here, and Duncan has written a post on Cohesity 4.0 as well.