I had a query recently from a partner who was deploying VMware Horizon View 6.1 on top of an all-flash VSAN 6.0. They had done all the due diligence with configuring the AF-VSAN appropriately, marking certain flash devices as capacity devices, and so on. The configuration looked something like this:
The they went ahead and deployed Horizon View 6.1, which they had done many times before on hybrid configurations. They were able to successfully deploy full clone pools on the AF-VSAN, but hit a strange issue when deploying linked clone pools (floating/dedicated). The clone virtual machine operation would fail with an “Insufficient disk space on datastore” error, similar to the following:
One of the really nice new features of VSAN 6.0 is fault domains. Previously, there was very little control over where VSAN placed virtual machine components. In order to protect against something like a rack failure, you may have had to use a very high NumberOfFailuresToTolerate value, resulting in multiple copies of the VM data dispersed around the cluster. With VSAN 6.0, this is no longer a concern as hosts participating in the VSAN Cluster can be placed in different failure domains. This means that component placement will take place across failure domains and not just across hosts. Let’s look at this in action.
One of the most common issues I got questions about in VSAN 5.5 was “why is VSAN deploying thick disks, when all of the documentation stated that VSAN deploys thin disks”?
The answer was quite straight forward, and was due to the fact that the VMs were being deployed without a VM Storage Policy. This meant that it went through the standard VM deployment wizard which offered administrators the option of thin, lazy-zeroed thick (LZT) and eager-zeroed thick (EZT). The default option is LZT, which if you just do click-click-click (just like I do) when deploying a VM, then you end up deploying a LZT format VM, even on the VSAN datastore. I described this issue in this older blog post. Its only when you select an actual VM Storage Policy when deploying a VM that VSAN uses the Object Space Reservation capability, which by default is 0%, meaning that the VM is effectively thinly provisioned. We realized that this was causing some issues for customers so we improved this whole deployment mechanism in 6.0 with the introduction of Datastore Default policies.
I learnt something interesting about Virtual Volumes (VVols) last week. It relates to the way in which snapshots have been implemented in VVols. Historically, VM snapshots have left a lot to be desired. So much so, that GSS best practices for VM snapshots as per KB article 1025279 recommends having on 2-3 snapshots in a chain (even though the maximum is 32) and to use no single snapshot for more than 24-72 hours. VVol mitigates these restrictions significantly, not just because snapshots can be offloaded to the array, but also in the way consolidate and revert operations are implemented.
There are a couple of key concepts to understanding Virtual Volumes (or VVols for short). VVols is one of the key new storage features in vSphere 6.0. You can get an overview of VVols from this post. The first key concept is VASA – vSphere APIs for Storage Awareness. I wrote about the initial release of VASA way back in the vSphere 5.0 launch. VASA has changed significantly to support VVols, with the introduction of version 2.0 in vSphere 6.0, but that is a topic for another day. Another key feature is the concept of a Protocol Endpoint, a logical I/O proxy presented to a host to communicate with Virtual Volumes. My good pal Duncan writes about some considerations with PEs and queue depths here. This again is a topic for a deeper conversation, but not today. Today, I want to talk about a third major concept, a Storage Container.
In a previous post, I discussed the difference between a component that is marked as ABSENT, and a component that is marked as DEGRADED. In this post, I’m going to take this up a level and talk about objects, and how failures in the cluster can change the status of objects. In VSAN, and object is made up of one or more components, so for instance if you have a VM that you wish to have tolerate a number of failures, or indeed you wish to stripe a VMDK across multiple disks, then you will certainly have multiple components making up the VSAN object. Read this article for a better understanding of object and components. Object compliance status and object operation status are two distinct states that an object may have. Let’s look at them in more detail next.
In this post, we talk about a particular behaviour with using the default (or None) policy with VSAN. I have stated many times in the past that when a VM is deployed on the VSAN datastore, it behaves like it is thinly provisioned unless the capability ‘Object Space Reservation’ (OSR) is specified in the VM Storage Policy. The OSR will pre-allocate space on the VSAN datastore for the virtual machine’s storage objects, and is specified as a percentage of the actual VMDK size. However, there is a slightly different behaviour when the default policy is used. Once again, I was in a conversation with a customer who stated that when he used the default policy of “None”, he could see space being consumed on the VSAN datastore was equal to the size of the VMDK * FTT (Number of Failures To Tolerate). He wondered why this was the case when the default policy clearly did not contain an Object Space Reservation capability.