Sizing for large VMDKs on vSAN

I’ve recently been involved in some design and sizing for very large VMDKs on vSAN. There are a couple of things to keep in mind when doing this, not just the overhead when deciding to go with RAID1, RAID5 or RAID6, but also what this means for component counts. In the following post, I have done a few tests with some rather large RAID-5 and RAID-6 VMDKs, just to show you how we deal with it in vSAN. If you are involved in designing and sizing vSANs for large virtual machines, you might find this interesting.

Let’s start with a RAID-5  example. Let’s also take a VM with a significantly large 8TB VMDK, deployed as a RAID-5.

As a RAID-5,  that 8TB VMDK will have a X1.33 capacity overhead to cater for the parity. So in essence, we are looking at 10.66TB to implement this RAID-5 configuration. This configuration can tolerate 1 failures in vSAN. That is a considerable space savings when compared to using the default RAID-1 Failures-To-Tolerate setting. RAID-1 would require a second copy of the data in that case, so 2 x 8TB = 16TB. So RAID-5 is giving us a considerable space-saving over RAID-1. However, you will need to have at least 4 hosts in your vSAN cluster to implement a RAID-5 configuration, whereas RAID-1 can be implemented with 2 or 3 nodes.

Now we come to the component count. Readers who are well versed in vSAN will be aware that the maximum component size in vSAN is 255GB. This has been around since vSAN 5.5, and continues to be the same today. So with 10.66TB, We would have something like 40 or more 256GB segments to accommodate this 10.66TB requirement. I deployed this configuration on my own environment, and this is what was created on vSAN, using a policy of FTT (FailureToTolerate)=1, FTM (FailureToleranceMethod)=Erasure Coding.

I have a total of 44 components in this example, 11 per RAID-5 segment. These components are then concatenated into a RAID-0 in each RAID-5 segment. If you want to see this on your own vSAN, you will have to use the Object Space Reservation setting of 100% to achieve this (along with the necessary disk capacity of course). Since vSAN deploys objects thinly, if you do not use OSR=100%, you will only see the bare minimum 4 components in the RAID-5 object. As you consume capacity in the VMDK, the layout will grow accordingly.

Now the other thing to keep in mind with component count is snapshots. A snapshot layout will follow the same layout as the VMDK that it is a snapshot of. Therefore a snapshot of the above VMDK will have the same layout, as shown here:

This means that to snapshot this RAID-5 VMDK, I will consume another 44 components (which needs to be factored into the component count).

Let’s take another example that I have been working on. Let’s  take the same VM with an 8TB VMDK, and deploy it as a RAID-6.

As a RAID-6,  that 8TB VMDK will have a X1.5 capacity overhead to cater for the double parity required for RAID-6. So in essence, we are looking at in the region of 12TB to implement this RAID-6 configuration. Of course, the point to remember is that this configuration can tolerate 2 failures in vSAN. That is a considerable space savings when compared to using RAID-1 to tolerate 2 failures. This would be 3 copies of the data in that case, so 3 x 8TB = 24TB. So RAID-6 is giving us a 100% space-saving. You will of course need to have at least 6 hosts in your vSAN cluster to implement a RAID-6 configuration, so keep that in mind as well.

Now the next thing is the component count. So with a ~12TB RAID-6 object (8TB data, 4TB parity), this is what was deployed on vSAN, once I set the Object Space Reservation to 100% and choose a RAID -6 policy (FTT=2, FTM=Erasure Coding):

In each of the RAID-6 segments, there are 9 components (just over the ~2TB). With 6 segments, this implies we are looking at 54 components to deploy that 8TB VMDK in a RAID-6 configuration. As before, any snapshots of this VM/VMDK will instantiate a snapshot delta object with the same configuration of 54 components.

Hopefully that explains some of the considerations when dealing with some very large VMDKs on vSAN.

 

4 comments
  1. Hi Cormac,

    That’s some great info!
    What are your views on stripe width value (>1? ) for these large VMDK files ?

    • In all likelihood, you are going to end up with the segment across multiple disks with these sizes anyway, so I’m not sure you need to include it in the policy.

  2. Hi Cormac. Thanks for the great blog. I’m just wondering which ESXi/vSphere version those screenshots come from. I’m running 6.5 in my lab and I don’t see the VSWP and snapshot objects. Is this something we can expect to see in 6.6? And if so, will the memory snapshot object appear in the list too?

Comments are closed.