VSAN 6.0 Part 3 – New Default Datastore Policy

One of the most common issues I got questions about in VSAN 5.5 was “why is VSAN deploying thick disks, when all of the documentation stated that VSAN deploys thin disks”? The answer was quite straight forward, and was due to the fact that the VMs were being deployed without a VM Storage Policy. This meant that it went through the standard VM deployment wizard which offered administrators the option of thin, lazy-zeroed thick (LZT) and eager-zeroed thick (EZT). The default option is LZT, which if you just do click-click-click (just like I do) when deploying a VM, then you…

VSAN 6.0 Part 2 – v2 On-disk Format Upgrade Considerations

I was heavily involved in the documentation effort for VSAN 6.0, but I know that not everyone likes to RTFM, so to speak. What I thought I would do in this post is give an overview of the upgrade process, and highlight some considerations. But I really would urge you to read through the VSAN 6.0 Administrators Guide, and perhaps the VSAN Troubleshooting Reference Manual, especially the sections dealing with upgrades, if you do plan to upgrade from VSAN 5.5 to 6.0. There is  a lot of useful information there. There are four steps to the upgrade process: Upgrading vCenter…

VSAN 6.0 Part 1 – New quorum mechanism

vSphere 6.0 released yesterday. It included the new version of Virtual SAN – 6.0. I now wish to start sharing some of the new features and functionality with you. One of things we always enforced with version 5.5 was the fact that when you deployed a VM with NumberOfFailuresToTolerate = 1, you always had at least 3 components: 1st copy of the data, 2nd copy of the data, and then a witness component for quorum. In version 5.5, for a VM to remain accessible, “one full copy of the data and more than 50% of components must be available”. We…

Virtual Volumes – A new way of doing snapshots

I learnt something interesting about Virtual Volumes (VVols) last week. It relates to the way in which snapshots have been implemented in VVols. Historically, VM snapshots have left a lot to be desired. So much so, that GSS best practices for VM snapshots as per KB article 1025279 recommends having on 2-3 snapshots in a chain (even though the maximum is 32) and to use no single snapshot for more than 24-72 hours. VVol mitigates these restrictions significantly, not just because snapshots can be offloaded to the array, but also in the way consolidate and revert operations are implemented.

Virtual Volumes – A closer look at Storage Containers

There are a couple of key concepts to understanding Virtual Volumes (or VVols for short). VVols is one of the key new storage features in vSphere 6.0. You can get an overview of VVols from this post. The first key concept is VASA – vSphere APIs for Storage Awareness. I wrote about the initial release of VASA way back in the vSphere 5.0 launch. VASA has changed significantly to support VVols, with the introduction of version 2.0 in vSphere 6.0, but that is a topic for another day. Another key feature is the concept of a Protocol Endpoint, a logical I/O…

Migrations and Virtual Volumes – Deep Dive

Recently I published an article on Virtual Volumes (VVols) where I touched on a comparison between how migrations typically worked with VAAI and how they now work with VVols. In the meantime, I managed to have some really interesting discussions with some of our VVol leads, and I thought it worth sharing here as I haven’t seen this level of detail anywhere else. This is rather a long discussion, as there are a lot of different permutations of migrations that can take place. There are also different states that the virtual machine could be in. We’re solely focused on VVols…

vSphere 6.0 Storage Features Part 6: action_OnRetryErrors

In vSphere 6.0, an improvement has been made to how we handle I/O issues, such as flaky drivers, misbehaving firmware, dropped frames, fabric disruption, dodgy array firmware, and so on which can cause I/O failures. The issue is that, previously, we continually retry these sorts of I/O errors, which can lead to all sorts of additional problems. In this release we are changing our behaviour for marking a path dead.