An Introduction to Coraid

CoraidAnother session that I attended during the UK National VMUG earlier this month was an overview of Coraid Technology from Max Brown, one of Coraid’s System Engineers based in the UK. My first introduction to Coraid was at VMworld 2011 & I did a short overview of their AoE (ATA over Ethernet) solution here. I wanted to get along to this session to see what had changed since then.

In a nutshell, Coraid present SAN storage as local storage via AoE. How do they do that? Well, Coraid provide the HBA, the AoE initiator software  (which must be installed on the ESXi host) and the storage array. Plug in all the parts, and you have your Coraid solution.

Max reviewed AoE with us, and how this protocol along with Coraid’s design implementation allows hosts to see local disks even though they are being presented over a storage fabric. The neat part is that with Coraid’s HBAs, there is no multipathing software to consider – all paths to the storage are used concurrently. As mentioned, you simply plumb the bits together and you no longer have to concern yourself with multipathing, path policies, load balancing or failover settings – it’s all taken care off. That is rather nice.

Secret Sauce – AoE Mapping Layer

Max then discussed some of Coraid’s secret sauce – the AoE mapping layer. The AoE mapping layer essentially breaks I/Os down into 8KB writes, and tracks which ones have been sent and received. Once the mapping layer has received all the ACKs from the Coraid array, the IO is then acknowledged back to the OS.  Take for example a 64KB I/O transfer. The I/O is split into 8 x 8KB packets. Coraid uses jumbo frames to accommodate these larger sizes. If there are 4 paths to the storage from the host, 4 x 8KB packets are sent in parallel, and once these have been acknowledged, the next 4 x 8K packets can be sent. Once the mapping layer has received all the ACKs from the Coraid array, the I/O is then acknowledged back to the OS.

The delivery is also connectionless to avoid overhead, and should a packet get dropped, Coraid have some additional secret sauce in their AoE mapping layer  to handle retransmits.

The mapping layer is part of the Coraid HBA driver. The Coraid HBA does not require any configuration, simply install the card and driver – no need for MPIO configuration, no need for vSwitches (like you would with iSCSI). All Coraid LUNs appear as “local SCSI disk” to the ESXi hosts but with the benefits of being SAN (multipathing, failover, load balancing).

Max then introduced us to some of the Coraid flagship products. First up was the EtherDrive SRX, which is their storage appliance. It can hold up to 36 drives, and doesn’t need to be fully populated. You can also mix and match SATA/SAS/SSD drives in the same shelf. You can have 2 or 4 x 10 Gb Ethernet or 6 x 1 Gb Ethernet per array.

The important part to the SRX is the Virtual DAS (vDAS) technology which abstracts the SAN so that the host sees the storage as a local disk. Connectivity to the storage is via either 1Gb or 10Gb HBAs, but you must use Coraid’s AoE drivers/initiators.

Last year at VMworld 2011, Coraid announced EtherFlash Cache as an option for the SRX to generate even more performance. So while SSD can appear as a disk, another option is to allow them to be used for caching (read) on the SRX.

Features
This all sounds well and good, but what about other features which we expect to see with SANs, such as replication, snapshots, cloning, thin provisioning, etc. To address this Coraid have the Etherdrive VSX appliance. If you require more availability than that provided by a single SRX array (SRX is a single controller architecture), highly available, synchronously mirrored systems can be facilitated with VSX. It will also allow asynchronous remote replication between arrays for disaster recover scenarios.

How it works is that you present spindles from SRX to the VSX. Next, you carve up the LUNs and present them to the ESXi hosts from the VSX. Snapshots, cloning, Thin Provisioning & Replication are now available as features. The interesting part is that reads can go directly to the SRX; it is only the writes which have to go via the VSX.

Management
The whole Coraid infrastructure is managed via the EtherCloud Storage Manager product. This includes management of both VSX & SRX appliances.

They also have the ability to create storage profiles so that device selection for VMs becomes easier. This is not to be confused with VM Storage Profiles in vSphere – the two features are not integrated. The concept is very similar however, whereby Coraid customers can tag certain datastores with certain capabilities (think gold, silver, bronze) and then select datastores based on these capabilities .

While there is no plan just yet to integrate the two features (which is a shame as I can see this being extremely useful for VM provisioning – similar to VASA, but you the user can define the capabilities), Coraid state that as they see vSphere storage profiles being deployed in customer environments more & more , they will look for opportunities to more tightly integrate the functionality to benefit customers as needed.

However, the snapshot’ing feature in Coraid’s Storage Manager is integrated with VM snapshots to allow coordination & scheduling. Coraid do have a vCenter plugin that coordinates VM snapshots with their storage (i.e. quiesce the VM, then take a snapshot of the Coraid logical volume). The plug-in is part of the Coraid EtherCloud Storage Manager product.

Coraid continue to look for additional feedback on how to further increase their integration with vCenter to benefit customers, so if you are a Coraid customer and have some ideas, pass them on.

vSphere integration
As of today, Coraid do not currently support VAAI primitives but I’ve been reliability informed that this is actively being scoped for a future release.

Coraid do not have an SRM SRA for their solution either. On this point, Coraid feel that because host-based replication is now possible with vSphere Replication, the need for array based replication is less of a concern for customers. Although Coraid are aware that vSphere Replication is currently only supported with up to around 75 VMs, they feel that with other technologies such as Datacore, Zerto, InMage, Visioncore, Veeam, etc, the host based replication option is becoming more attractive, and lends itself nicely to the vDAS (Virtual DAS) proposition, i.e. move some of the storage intelligence/features from the array back up to the application. Of course, another valid point Coraid makes is around SRM array based replication granularity working at the LUN level (replicate all or nothing) which is not as granular as the per-VM replication.

Opinion
I do like this solution from Coraid – it is quite a unique solution, and provides a valid alternative to NAS, iSCSI, FC & FCoE. It is also an extreme scale out solution – Max provides some very interesting scale out numbers from a few of Coraid existing customer base. Although it conforms to a simplistic approach similar to NAS, it would be nice to see more of a general acceptance for AoE. As it stands there are no generic AoE drivers/initiators in vSphere, so these would have to be sourced directly from Coraid (or any other vendor with an AoE solution). I would also like to see more integration with vSphere (VAAI, VASA, vCenter UI, SRM), but then I always say this. My gut feel is that with additional integration, it may pave they way for more buy-in from the VMware community. A final point is on storage profile feature – I think this could be a very neat leveraging point – allowing customers to define their own capabilities on the underlying Luns, allow VASA to surface those up into vSphere, and then use vSphere’s profile driven storage to build profile for VM deployment – how cool would that be? It would definitely be a unique integration point. I guess we’ll have to wait and see what Coraid decide to do.

Get notification of these blogs postings and more VMware Storage information by following me on Twitter: @CormacJHogan

9 thoughts on “An Introduction to Coraid

  1. Nice article. I am still not sure what to think though. The fact that it is a single controller system kinda give me the shivers. If I understood correctly traffic can’t be secured like iSCSI can using CHAP authentication. Also it seems it is non-routable…

    I also wonder how they (from a pricing perspective) compare to other new storage solutions.

    Still I was intrigued by their solution, but somehow the presentation itself did not convince me that it was enterprise ready.

    • Indeed. I know some other ‘new’ storage array vendors who initially started with a single controller arrays/appliances but quickly moved to dual controller models.

  2. Nice read Cormac.

    I have met Coraid for the first time this year at one of the TechFieldDay events. Although at first sight it is not that easy to grasp what they do (due to our fixed visions on SAN/NAS) it really made sense giving it a few thoughts. The simplification of DAS combined with the idea of scale out NAS technology truly is a good step in the future.

    Tip for their R&D: skip VAAI, go full frontal for vVols! You might just have that architecture that fits best in that concept!

    • Interesting point Hans. My take on this is that VAAI is here at the moment, and has been tried and tested. vVOLs is still a VMware ‘future’, but yes, it would be a nice fit for Coraid’s scale-out storage technology. More about vVOLs here folks, if you are interested.

  3. The key point missing in this description is that treating it as a “local disk” means the storage can’t be shared. When I last talked to Coraid, there wasn’t a locking primitive (equivalent to SCSI reservation or VAAI ATS) available in ATAoE, which means a LU can’t be shared between hosts. Isn’t sharing the main point of putting the storage outside the box? DAS != SAN

    • a few ideas here: VSA, vSphere Distributed Storage, distributed file systems, vVols, … all concepts that can benefit from DAS. Are they as mature as SAN is today? No. Worth the try? Yes, absolutely.

  4. I used a low end Coraid SAN at my last job when I was looking for cheap storage for a DR cluster target using Veeam to replicate our critical VMs. The price was amazing – probably 1/4 to less than any other solution. Since all we really need was a bunch of shared space, it was the ideal testing grounds for it. I came away quite impressed – the low end system we had performed amazingly well, and really was quite easy to set up.

  5. @Andy Banta: Coraid LUNs can be shared between all/any hosts. When you “claim” a LUN on one ESX host, the other ESX hosts also see it instantly. Coraid has SCSI reservations but we don’t have ATS yet. There’s a Coraid + vMotion demo video at http://youtu.be/Jaqsrm_eOTE
    Coraid looks like local disk to the host, but without the physical limitations of being local.

  6. interesting chadness mentions low cost, becuase in our san procurement process I found coraid to be more expensive than an emc vnx5300 of simmilar disk counts.