A closer look at EBS-backed vSAN

Cormac

5 years ago

At VMworld 2018, we announced an initiative to use EBS, Amazon Elastic Block Store, for vSAN storage. At present vSAN is configured using the current EC2 i3 configurations, which run ESXi on bare-metal. I have seen these referred to as i3p, but my understanding is that they correlate to the i3.metal instances as shown here. The Amazon EC2 i3 instances include Non-Volatile Memory Express (NVMe) SSD-based instance storage. These are configured with 10TB of storage per host, but there are some limitations. For one, if you wish to expand on capacity, you need to add another complete EC2 i3 instance. And if you wished to use VMware Cloud on AWS for something like a low-cost DRaaS type solution (as many of our customers do), then this becomes very cost prohibitive. It was for that reason that we started to look at an alternate solution, namely using EBS, specifically the General Purpose SSD model referred to as gp2.

Now my colleague Duncan has already done a very good write-up on a session that was delivered at VMworld by Rakesh (one of our PMs) and Peng (one of our engineers). The session was entitled HCI1998BU – Enable High-Capacity Workloads with Elastic EBS-Backed vSAN on VMware Cloud. From there you can also find the link to the recording of the VMworld session. I just wanted to add a little bit more around the offering as I’ve started to get some additional questions on this solution recently.

Scalability

Let’s begin with scale. First of all, we are offering EBS-backed vSAN with capacity that ranges from 15TB to 35TB per host. This is far higher than the 10TB per host currently available on I3. For a 4 node cluster, this means that capacity can range from 60TB to 140TB. There is a requirement today that as you add additional hosts to your EBS-backed vSAN cluster, you must add nodes with identical capacity. In the VMworld presentation, we were informed that there is a plan to relax this requirement and add capacity at a few TB at a time, but this is not the case today. Today, you must add additional hosts for scaling capacity as well. The maximum cluster we can have today in VMC/AWS is 16 nodes, thus your cluster could scale to 560TB.

Performance

IOPS from EBS gp2 is limited to 3X the capacity, measured in GiB. However, this is also throttled at 10K IOPS and 160MBps Throughput per volume. Thus, to get 10K IOPS, you need a minimum 3TB volume size. For vSAN, this will be the cache device size since vSAN is a caching system. With 3 disk groups per host, that adds up to ~9TB cache per node. This should in turn give something in the region of 30K IOPS per node. This is all at a sub-millisecond latency, but once the threshold limit has been reach, latency starts to increase significantly.

Data Services

EBS-backed vSAN only offers some of the data services that are offered with the i3 bare-metal. In particular, deployments on EBS-backed vSAN use compression only. There is no deduplication offered. Checksum is enabled. Objects are deployed with Erasure Coding and a Failures to Tolerate value of 1, implying a RAID-5 configuration – thus there is a need for a minimum of 4 nodes in the cluster. Why not enable deduplication? Well, the reason was performance, considering the throttling of IOPS and Throughput on EBS. There is a considerable amount of IO related to the deduplication hash map which would eat into the available IOPS on EBS. Performance was found to be much better on EBS-backed vSAN without deduplication enabled. This then begs the question around space efficiency. Rakesh and Peng addressed this in the VMworld video whereby they highlighted that space savings for the workloads that we expect to run on EBS-backed vSAN (such as OTLP – Oracle and SQL Server) are not negatively impacted by a lack of deduplication.

Availability

There is one significant advantage when it comes to availability with EBS-backed vSAN, and this has to do with resync traffic. When running vSAN on-premises (or as an i3p) and there is a host failure or a host being decommissioned from the cluster, all of the data on that host has to be rebuilt elsewhere in the cluster for the objects to maintain their compliance state. In EBS-backed vSAN, this works much differently. Let’s take a situation on EBS-backed vSAN where a host is going to be decommissioned. First of all, a new host is added to the cluster to cater for the compute requirements of the host that is being decommissioned. Next, a new host is made available outside the cluster, and the EBS-backed volumes that were on the host being decommissioned are detached from that host and attached to the new host. The original host is then removed, and this new host is added to the cluster. Now all that is needed is for the changes that occurred during the detach and re-attach steps to be resync’ed to the storage. Finally, the host that was added to take up the compute slack can be removed. Compare this to a full host resync that would be required in an on-premised deployment or even an i3p deployment. Quite a nice feature for sure.

Futures

As well as incremental scale out of capacity, Rakesh alluded to another plan, which is of course to use external storage arrays in VMC for AWS as well. Of course, we should take into account the disclaimer that was highlighted at the beginning of the video, but this is another interesting approach to provisioning low-cost storage going forward. Interestingly, he mentioned that this storage could be consumed not just for datastores, but also by Guest OS’es running in VMs. I guess we will have to wait and see.