I was heavily involved in the documentation effort for VSAN 6.0, but I know that not everyone likes to RTFM, so to speak. What I thought I would do in this post is give an overview of the upgrade process, and highlight some considerations. But I really would urge you to read through the VSAN 6.0 Administrators Guide, and perhaps the VSAN Troubleshooting Reference Manual, especially the sections dealing with upgrades, if you do plan to upgrade from VSAN 5.5 to 6.0. There is a lot of useful information there.
There are four steps to the upgrade process:
- Upgrading vCenter Server to 6.0
- Upgrading the ESXi hosts to 6.0
- Upgrading the on-disk filesystem format from v1 to v2 (VMFS-L to VirstoFS)
- Upgrading the components to v2
Items 1 & 2 are outside the scope of this post. Refer to the generic vSphere 6 documentation on how to do those. Items 3 & 4 are done via a new RVC command that we will discuss in more detail here.
The first thing to consider is how many nodes are in your VSAN cluster. Is it 4 or more, or just 3 nodes? Different options to the command need to be used to upgrade the on-disk format depending on the configuration.
4 node or more in a cluster
The RVC command used to upgrade the on-disk format of the existing disk groups in the cluster when there are 4 or mode nodes does not take any options. The command is:
This command implements a rolling upgrade of the on-disk format of all the nodes in the cluster. It also preserves the availability of VMs, e.g. number of failures to tolerate. Therefore if there is a hardware failure while you going through the upgrade process, the VMs are still protected and remain accessible. This is the process it uses:
- Select first host.
- Evacuate disk group(s) contents from first host to other hosts in the cluster, which maintains full VM availability.
- Once the disk group is evacuated, delete disks from disk group(s) which have v1 on-disk format.
- Adds disks back to disk group(s), but now with v2 on-disk format. This process will reformat the entire disk group. This procedure evacuates/destroys the entire disk group, then does a fresh format using v2.
- Select next host and repeat steps 2,3 & 4 until all hosts are upgraded.
- When all hosts upgraded, upgrade the objects from v1 to v2. Objects are not upgraded to v2 until all disk groups are v2. Upgrading an object changes the protocol to the new version and unlocks the new features in version 6.0 such as 2TB+ VMDK support, new quorum mechanism, etc.
The first consideration here is that there must be enough capacity remaining in the cluster to evacuate the disk group contents on each of the hosts. If there is not, please look at the option that can be used in the 3 node cluster below, which is also applicable when there is insufficient capacity to do full evacuations of the disk groups.
The second consideration is “maintenance mode”. Administrators should not place any hosts into maintenance mode when doing an on-disk upgrade as this will make the resources on that host unavailable for component evacuation from the host that is currently being upgraded. As part of the upgrade process, the disk groups are evacuated without the need for maintenance mode.
3 node cluster (or insufficient resources)
For 3 node clusters, and for clusters that do not have the available capacity to do an evacuation of each disk group, the on-disk format can still be upgraded to v2. However the VMs will be unprotected for the duration of the upgrade.
If you attempt to upgrade the on-disk format to v2, and there are not enough resources in the cluster, it will fail as follows:
vsan.v2_ondisk_upgrade <cluster> --allow-reduced-redundancy
This method will not evacuate data to the other hosts in the cluster. This option will simply remove disk group(s), and add the disks back to a v2 format disk group. This is the process it uses:
- Select first host.
- Delete disks from disk group(s) which have v1 on-disk format. We ensure all objects stay available (but with reduced redundancy).
- Adds disks back to disk group(s), but now with v2 on-disk format.
- Synchronize components so that they are back to compliance.
- Select next host and repeat steps 2, 3 & 4 until all disk groups are upgraded
- When all disk groups are upgraded, upgrade the objects from v1 to v2.
Objects will have components going into a degraded state as part of this upgrade process. This is normal, and the components of the objects will be resynchronized as soon as the on-disk format is upgraded. This can take a bit of time if there are lots of components to resynchronize.
There is a detailed step-by-step upgrade path in the VSAN 6.0 Administrators Guide. There is also details on some possible gotchas in the VSAN 6.0 Troubleshooting Reference Manual. I would strongly urge anyone contemplating an upgrade of VSAN to read these documents carefully.