VSAN 6.2 Part 2 – RAID-5 and RAID-6 configurations

Those of you familiar with VSAN will be aware that when it comes to virtual machine deployments, historically, objects on the VSAN datastore were deployed either as a RAID-0 (stripe) or a RAID-1 (mirror) or a combination of both. From a capacity perspective, this was quite an overhead. For instance, if I wanted my VM to tolerate 1 failure, I need two copies of the data. If I wanted my VM to tolerate 2 failures, I needed three copies of the data and if I wanted my VM to tolerate the maximum number of failures, which is 3, then I had to have 4 copies of the data stored on the VSAN datastore. In VSAN 6.2, some new configurations, namely RAID-5 and RAID-6 are introduced to help reduce the overhead when configuring virtual machines to tolerate failures on VSAN. This feature is also termed “erasure coding”. However the use of the term “erasure coding” and it relationship with RAID-5/6 has caused confusion in some quarters. If you want to get a primer on erasure coding, and how it ties into how RAID-5/6 configurations are implemented on VSAN, have a read of this excellent article by our BU CTO, Christos Karamanolis.

Introduction to RAID-5/RAID-6 on VSAN

Note that there is a requirement on the number of hosts needed to implement RAID-5 or RAID-6 configurations on VSAN. For RAID-5, a minimum of 4 hosts are required; for RAID-6, a minimum of 6 hosts are required. The objects are then deployed across the storage on each of the hosts, along with a parity calculation. The configuration uses distributed parity, so there is no dedicated parity disk. When a failure occurs in the cluster, and it impacts the objects that were deployed using RAID-5 or RAID-6, the data is still available and can be calculated using the remaining data and parity if necessary.

RAID-5 and RAID-6 are fully supported with the new deduplication and compression mechanisms which were also introduced with VSAN 6.2.

Also note that if you include Number of disk objects to stripe as a policy setting for the RAID-5/6 objects, each of the individual components that make up the RAID-5 or RAID-6 objects may also be striped across multiple disks.

As mentioned, these new configurations are only available with VSAN 6.2. They are also only available for all-flash VSAN. You cannot use RAID-5 and RAID-6 configurations on hybrid VSAN.

VM Storage Policies for RAID-5 and RAID-6

A new policy setting has been introduced to accommodate the new RAID-5/RAID-6 configurations. This new policy setting is called Failure Tolerance Method. This policy setting takes two values: performance and capacity. When it is left at the default value of performance, objects continue to be deployed with a RAID-1/mirror configuration for the best performance. When the setting is changed to capacity, objects are now deployed with either a RAID-5 or RAID-6 configuration.

VSAN FTMThe RAID-5 or RAID-6 configuration is determined by the number of failures to tolerate setting. If this is set to 1, the configuration is RAID-5. If this is set to 2, then the configuration is a RAID-6. Of course, you will need to have the correct number of hosts in the cluster too. Note that if you want to tolerate 3 failures, you will need to continue using RAID-1.

Overview of RAID-5

  • Number of failure to Tolerate = 1
  • Failure Tolerance Method = Capacity
  • Uses x1.33 rather than x2 capacity when compared to RAID-1
  • Requires a minimum of 4 hosts in the VSAN cluster

RAID-5Overview of RAID-6

  • Number of failure to Tolerate = 2
  • Failure Tolerance Method = Capacity
  • Uses x1.5 rather than x3 capacity when compared to RAID-1
  • Requires a minimum of 6 hosts in the VSAN cluster

RAID-6Performance Considerations – I/O Amplification

As highlighted in Christos’ excellent erasure coding article above, RAID-5 and RAID-6 configurations will not perform as well as RAID-1 configurations. This is due to I/O amplification. During normal operations, there is no amplification of reads. However there is I/O amplification when it comes to writes, (especially partial writes) since both the current data and parity have to be read, current and new data must be merged, new parity must be calculated, and then the new data and new parity need to be written back. So that results in 2 reads and 2 writes for a single write operation. For RAID-6, the write amplification is 3 reads and 3 writes due to the double-parity.

So while there are significant space savings to be realized with this new technique, customers need to ask themselves whether maximum performance is paramount. If their workloads do not require maximum performance, significant space savings (and thus $$$) can be realized.

Design Decisions – Data Locality Revisited

This is something I mentioned in the overview of VSAN 6.2 features when I discussed R5/R6 as implemented on VSAN. The VSAN team made a design choice whereby core vSphere features such as DRS/vMotion and HA do not impact the performance of the virtual machine running on VSAN. In other words, a conscious decision was made not to do “data locality” in VSAN (apart from stretched clusters, where it makes perfect sense). A VM can reside on any host and any storage in the cluster and continue to performance optimally. This non-reliance on data locality lends itself to R5/R6, where the components of the VMDK are spread across multiple disks and hosts. Simply put, with R5/R6, the compute does not reside on the same node as the storage.

With this design we can continue to run core vSphere features such as DRS/vMotion and HA, and not impacting the performance of VM that is using R5/R6 for its objects, no matter which host it runs on in the cluster.

Tolerating 1 or 2 failures, not 0 or 3

RAID-5/6 configurations can only be used when the number of failure to tolerate is set to 1 or 2 in the policy. If you attempt to tolerate 0 or 3 failures, and you try to deploy a VM with this policy, you will be notified that it is unsupported. A sample warning is shown below:

R5-6 warning

Witness

Note that RAID-5/RAID-6 do not need witness components. With RAID-5, there will be 3 data components and a parity component; with RAID-6, there will be 4 data components and 2 parity components.

Conclusion

This is a nice new feature for customers who may not need to achieve the maximum possible performance from VSAN and are more concerned with capacity costs, especially in all-flash VSAN. This feature, coupled with dedupe and compression should realize significant cost savings for all-flash VSAN customers.

One final note: RAID-5/RAID-6 is not supported in VSAN stretched clusters. This is due to stretched cluster only supporting 3 Fault Domains (site1, site 2 and witness) and RAID-5 objects requiring 4 Fault Domains (RAID-6 requiring 6). Objects must still be deployed with a RAID-1 configuration in VSAN 6.2 stretched clusters. This new space-saving feature is only supported in standard, all-flash VSAN deployments.

5 Replies to “VSAN 6.2 Part 2 – RAID-5 and RAID-6 configurations”

  1. Cormac, can you confirm: this new set of options in “failure toleration method” is and “interhost” layout policy, which augments the old-style “raid1” method. Administrators will still have the option of setting the “intrahost” policy of “disk stripes per object”, giving you yet another dimension for potential performance (but not space efficiency) improvements by utilizing multiple in-host drive/capacity elements instead of the single element stripe width in the default policy.
    Flash may not have the same boost in performance by aggregating multiple drives like doing it for traditional spindles, but it should still be able to realize some boost if the HBA or other drive subsystem is a bottleneck.

    1. Hi Jim,

      Just to clarify, the stripe width setting (number of disk stripes per object) is not intrahost or interhost. It simply means that a number of capacity disk devices will be used to place the object. The spindles might be on the same host, or they might be across hosts. At the end of the day, VSAN will decide based on a number of factors such as balance, etc.

      Having said that, yes, you can use “number of disk stripes per object” with the “new failure tolerance” method set to ‘capacity’. This means that every chunk of the RAID-5 or RAID-6 object could be placed in a RAID-0 stripe.

Comments are closed.