VSAN Part 7 – Capabilities and VM Storage Policies
In this post, the VSAN capabilities are examined in detail. These capabilities, which are surfaced by the VASA storage provider when the cluster is configured successfully, are used to set the availability, capacity & performance policies on a per VM basis when that VM is deployed on the vsanDatastore. There are 5 capabilities in the initial release of VSAN, as shown below.
I will also try to highlight where you should use a non-default value for these capabilities.
Number of Failures To Tolerate
This capability sets a requirement on the storage object to tolerate at least ‘Number of Failures To Tolerate’. This is the number of concurrent host, network or disk failures that may occur in the cluster and still ensuring the availability of the object. If this property is populated, it specifies that configurations must contain at least Number of Failures To Tolerate + 1 replicas and may also contain an additional number of witnesses to ensure that the object’s data are available (maintain quorum) even in the presence of up to Number of Failures To Tolerate concurrent host failures. Witness disks provide a quorum when failures occur in the cluster or a decision has to be made when a split-brain situation arises.
One aspect worth noting is that any disk failure on a single host is treated as a “failure” for this metric. Therefore the object cannot persist if there is a disk failure on host A and a host failure on host B when you have Number of Failures To Tolerate set to 1.
Number of Disk Stripes Per Object
This defines the number of physical disks across which each replica of a storage object is striped. To understand the impact of stripe width, let us examine in first in the context of write operations and then in the context of read operations.
Since all writes go to SSD (write buffer), the value of an increased stripe width may or may not improve performance. This is because there is no guarantee that the new stripe will use a different SSD; the new stripe may be placed on a HDD in the same disk group and thus the new strip will use the same SSD. The only occasion where an increased stripe width could add value is when there is there are many, many writes to destage from SSD to HDD. In this case, having a stripe could improve destage performance.
From a read perspective, an increased stripe width will help when you are experiencing many read cache misses. If one takes the example of a virtual machine consuming 2,000 read operations per second and experiencing a hit rate of 90%, then there are 200 read operations that need to be serviced from HDD. In this case, a single HDD which can provie 150 iops is not be able to service all of those read operations, so an increase in stripe width would help on this occasion to meet the virtual machine I/O requirements.
In general, the default stripe width of 1 should meet most, if not all virtual machine workloads. Stripe width is a capability that should only change when write destaging or read cache misses are identified as a performance constraint.
Flash Read Cache Reservation
This is the amount of flash capacity reserved on the SSD as read cache for the storage object. It is specified as a percentage of the logical size of the storage object (i.e. VMDK) . This is specified as a percentage value (%) with up to 4 decimal places. This fine granular unit size is needed so that administrators can express sub 1% units. Take the example of a 1TB disk. If we limited the read cache reservation to 1% increments, this would mean cache reservations in increments of 10GB, which in most cases is far too much for a single virtual machine.
Note you do not have to set a reservation in order to get cache. All virtual machines equally share the read cache of an SSD. The reservation should be left at 0 (default) unless you are trying to solve a real performance problem and you believe dedicating read cache is the solution. In the initial version of VSAN, there is no proportional share mechanism for this resource.
Object Space Reservation
All objects deployed on VSAN are thinly provisioned. This capability defines the percentage of the logical size of the storage object that may be reserved during initialization. The Object Space Reservation is the amount of space to reserve specified as a percentage of the total object address space. This is a property used for specifying a thick provisioned storage object. If Object Space Reservation is set to 100%, all of the storage capacity requirements of the VM are offered up front (thick). This will be lazy zeroed thick (LZT) format and not eager zeroed thick (EZT).
Force Provisioning
If this parameter is set to a non-zero value, the object will be provisioned even if the policy specified in the VM Storage Policy is not satisfied by the datastore. The virtual machine will be shown as non-compliant in the VM Summary tab, and relevant VM Storage Policy views in the UI. However, if there is not enough space in the cluster to satisfy the reservation requirements of at least one replica, the provisioning will fail even if Force Provisioning is turned on. When additional resources become available in the cluster, VSAN will bring this object to a compliant state.
Check out all my VSAN posts here.
I have a question around maximum VMDK size per vsan. Can I have a VMDK that spans multiple disks just from a space perspective. For example, if all of my HDD in each member of the vsan are 1tb drives. Could I create a vm with a 2tb VMDK that spans multiple physical drives or would I be limited to VMDK’s of 1tb since that is the hard size of my member disks?
Thanks,
Ian
Yes. In this case we would stripe the VMDK across multiple disks.