VSAN Design & Sizing – Memory overhead considerations

This week I was in Berlin for our annual Tech Summit in EMEA. This is an event for our field folks in EMEA. I presented a number of VSAN sessions, including a design and sizing session. As part of that session, the topic of VSAN memory consumption was raised. In the past, we’ve only ever really talked about the host memory requirements for disk group configuration as highlighted in this post here. For example, as per the post, to a run a fully configured Virtual SAN system, with 5 fully populated disk groups per host, and 7 disks in each disk group, a minimum of 32GB of host memory is needed. This is not memory consumed by VSAN by the way. This memory may also be used to run workloads. Consider it as a configuration limit if you will. As per the post above, if hosts have less than 32GB of memory, then we scale back on the number of disk groups that can be created on the host.

To the best of my knowledge, we never shared information about what contributes to memory consumption on VSAN clusters. That is what I plan to talk about in this post.

[Update]: Some pointed out that we have KB article 2113954 that explains this.

To understand memory consumption by Virtual SAN, the following equation may be used:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB x SSDSize)))

Where:

  • BaseConsumption: This  is the fixed amount of memory consumed by Virtual SAN per ESXi host. This is currently 3GB. This memory is mostly used to house the VSAN directory, per host metadata, and memory caches. When there are more than 16 nodes in a Virtual SAN cluster, the BaseConsumption increases by 300 MB to a total of 3.3 GB.
  • NumDiskGroups: This is the number of disk groups in the host, and ranges from 1 to 5.
  • DiskGroupBaseConsumption: This is the fixed amount of memory consumed by each individual disk group in the host. This is currently 500MB. This memory is mainly used as a resource to support in-flight operations on a per disk group level.
  • SSDMemOverheadPerGB: This is the fixed amount of memory allocated for each GB of SSD capacity. This is currently 2 MB in hybrid systems and is 7 MB for all flash systems. Most of this memory is used for keeping track of blocks in the SSD used for write buffer and read cache.
  • SSDSize: Size of the SSD in GB.

Caution: Please note that these numbers are for VSAN 6.0 and VSAN 6.1. These may change with future releases.

Now that we understand the requirements, let us run through a few scenarios:

Scenario 1: Let’s look at some working examples where the hosts have more than 32GB of memory per host, the number of hosts in the cluster is less than 16, and that the SSD size is 400GB.

Example 1: One disk group per host, hybrid configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3GB   +    (1   x     (500MB   +    (2MB    x    400)))
3GB   +    (500MB + 800MB)
3GB   +   1.3GB
= 4.3 GB 

Example 2: Three disk groups per host, hybrid configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3GB   +    (3   x     (500MB   +    (2MB    x    400)))
3GB   +    (3   x     (500MB + 800MB)
3GB   +    (3   x     (1.3GB)
3GB   +   3.9GB
= 6.9 GB

Example 3: One disk group per host, all flash configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3GB   +    (1   x     (500MB   +    (7MB    x    400)))
3GB   +    (500MB + 2800MB)
3GB   +   3.3GB
= 6.3 GB

Example 4: Three disk groups per host, all flash configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + ( SSDMemOverheadPerGB  x SSDSize)))

3GB   +    (3   x     (500MB   +    (7MB    x    400)))
3GB   +    (3   x     (500MB + 2800MB)
3GB   +    (3   x     (3.3GB)
3GB   +   9.9GB
= 12.9 GB

Scenario 2 : Let’s look at some working examples where the hosts have more than 32GB of memory per host, the number of hosts in the cluster is more than 16, and the SSD size is 600GB. When there are more than 16 nodes in a Virtual SAN cluster, the BaseConsumption increases by 300 MB to a total of 3.3 GB.

Example 5: One disk group per host, hybrid configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3.3GB   +    (1   x     (500MB   +    (2MB    x    600)))
3.3GB   +    (500MB + 1200MB)
3.3GB   +   1.7GB
= 5 GB

Example 6: Three disk groups per host, hybrid configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + ( SSDMemOverheadPerGB  x SSDSize)))

3.3GB   +    (3   x     (500MB   +    (2MB    x    600)))
3.3GB   +    (3   x     (500MB + 1200MB)
3.3GB   +    (3   x     (1.7GB)
3.3GB   +   5.1GB
= 8.4 GB

Example 7: One disk group per host, all flash configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3.3GB   +    (1   x     (500MB   +    (7MB    x    600)))
3.3GB   +    (500MB + 4200MB)
3.3GB   +   4.7GB
= 9 GB

Example 8: Three disk groups per host, all flash configuration:

BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))

3GB   +    (3   x     (500MB   +    (7MB    x    600)))
3GB   +    (3   x     (500MB x 4200MB)
3GB   +    (3   x     (4.7GB)
3GB   +   14.1GB
= 17.1 GB

Scenario 3 : Finally, let’s look at some examples where a host has less than 32GB of memory. In systems with less than 32GB of RAM, the amount of memory used will be scaled down linearly according to the formula (SystemMemory / 32) where SystemMemory is the amount of memory in the system in GB. Thus, if the system has 16 GB of RAM, the amount of memory consumed will be 1/2 of the output given the formula use to compute memory consumption. If the system has 8 GB it will be scaled down by 1/4.

Lets assume that the host has 16GB of memory, the number of hosts in the cluster is less than 16, and that the SSD size is 400GB.

Example 9: One disk group per host, hybrid configuration:

(BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))) 
* (SystemMemory / 32)

(3GB   +    (1   x     (500MB   +    (2MB    x    400))) * 0.5)
(3GB   +    (500MB + 800MB) * 0.5)
(3GB   +   1.3GB * 0.5)
= 4.3 GB * 0.5
= 2.15 GB

Example 10: Three disk groups per host, hybrid configuration:

(BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))) 
* (SystemMemory / 32)

(3GB   +    (3   x     (500MB   +    (2MB    x    400)))) * 0.5
(3GB   +    (3   x     (500MB + 800MB))) * 0.5
(3GB   +    (3   x     1.3GB)) * 0.5
(3GB   +   3.9GB) * 0.5
= 6.9 GB  * 0.5
= 3.45GB

Example 11: One disk group per host, all flash configuration:

(BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))) 
* (SystemMemory / 32)

(3GB   +    (1   x     (500MB   +    (7MB    x    400)))) * 0.5
(3GB   +    (500MB + 2800MB)) * 0.5
(3GB   +   3.3GB) * 0.5
= 6.3 GB  * 0.5
= 3.15GB

Example 12: Three disk groups per host, all flash configuration:

(BaseConsumption + 
(NumDiskGroups x 
(DiskGroupBaseConsumption + (SSDMemOverheadPerGB  x SSDSize)))) 
* (SystemMemory / 32)

(3GB   +    (3   x     (500MB   +    (7MB    x    400)))) * 0.5
(3GB   +    (3   x     (500MB + 2800MB))) * 0.5
(3GB   +    (3   x     3.3GB)) * 0.5
(3GB   +   9.9GB) * 0.5
= 12.9 GB  * 0.5
= 6.45GB

That completes the set of examples. From this, you should be able to calculate VSAN memory overhead. Once again, the considerations are as follows:

  • VSAN scales back on its memory usage when hosts have less than 32GB of memory
  • VSAN consumes additional memory when the number of nodes in the cluster is greater than 16
  • All Flash VSAN consumes additional memory resources compares to hybrid configurations
9 comments
      • Thanks very much!

        I would like to experiment with vSAN at home with two nodes. I have a SATA 512GB SSD, PCIe 512GB SSD and one mechanical 4TB SATA drives in each host.

        What would be the best way to configure these drives in vSAN? Will I get great disk performance if I add the mechanical 4TB into the mix?

        I wanted to use an all flash vSAN setup but I need the 4TB drive for backups (for capacity) so I am battling with the design/setup I should use for vSAN.

        PS: Just started reading the VMware vSAN book and its great!

          • Sorry, I should have mentioned that I will be using a Witness appliance with the two nodes.

            I’ve been playing around with a 3 node nested environment last week but I would like to start doing this with physical servers now.

            So back to my original question, do you have any suggestions/recommendations for the 3 disks I mentioned in vSAN?

            Its a pity vSAN doesn’t let you have a “fast tier” for one datastore and a “capacity tier” for another datastore. Having just one vsandatastore seems to limit you with what options you have?

            Still trying to understand all of vSANs concepts!

          • You won’t be able to put all 3 devices into VSAN.

            If you want to do all-flash, use the SSD and the PCI-E flash devices (placing the higher performance on as the cache tier).

            If you want to do hybrid, use either the SSD or PCI-E for the cache tier and the 4TB for the capacity tier.

          • That’s interesting. Can you not mix SSD and HDD in the capacity tier?

            If I was to use the PCIe SSD as the cache drive and the 4TB HDD for the capacity tier, would VMs running on this configuration run well? (ie: fast/responsive).

            The other option I have is to use a 128GB SATA SSD for cache and then use the PCIe 512GB SSD and another SATA 512GB SSD in the capacity tier?

            Just battling to find the best balance for performance and capacity considering the drives I have for vSAN.

            Sorry for all the newbie questions but I haven’t done any vSAN (yet) on physical servers!

          • Nope – no mixing in the capacity tier. Its either flash or spinning disk, not both.

            Can’t really offer advice on what will work best. Trade off will be performance versus capacity. Might be worth setting up a few combinations of all-flash vs. hybrid and see what you get.

          • That useful to know, I didn’t realise you couldn’t mix SSD and HDD in capacity tier.

            If I was to use the PCIe SSD 512GB drive for cache (for maximum write speed) and then use two SATA 512GB SSD drives for capacity (ie: all flash vSAN), then would it make sense to set the stripe width to 2? So I would have RAID 0 for the capacity tier and would access the vmdk files using both disks for read requests? Would this maximise performance?

            I read somewhere that all writes in an all flash vSAN are cached first but all reads go directly to the capacity tier. Is this correct?

            Also is having a 512GB drive for cache matter size wise considering my capacity is “only” 1TB?

            Thank you!

Comments are closed.