I know that there has been a lot of content already written in relation to the latest 7.0U2 release of vSAN. My good pal Duncan has done a considerable amount on work to highlight the new features, and has an excellent set of YouTube videos that you can review at your convenience. However, I thought I would create a bite-sized overview of some of the big ticket items that are in the vSAN 7.0U2 release, as many of my readers have asked me about some of these features in the past. I’m not covering all of the features, of which there are many. For a complete overview, head over to the release notes.
HCI Mesh Functionality and Scale Improvements
HCI Mesh, also referred to as Disaggregated vSAN, was introduced in 7.0U1 and is a method of addressing the stranded space issue when there are multiple vSAN clusters in operation. What I mean by stranded space is a situation where one vSAN cluster is heavily utilized from a capacity perspective, and another vSAN cluster has plenty of space available. HCI Mesh addresses this. HCI Mesh allows a local vSAN cluster to mount a vSAN datastore from another (remote) vSAN cluster, and vice-versa. This enables an administrator to provision virtual machine storage on either the local vSAN datastore or remote vSAN datastore. The big news in the 7.0U2 release is that you no longer need a vSAN cluster to mount a remote vSAN datastore. You can now mount a remote vSAN datastore to non-vSAN, vSphere compute-only clusters. That’s pretty cool. On top of that, we have also increased the scale, so that a single vSAN datastore can be mounted by up to 128 remote hosts. And note that there is no license required for these remote hosts; they can simply mount the vSAN datastore as they would any other remote datastore.
vSAN Stretched Cluster Functionality and Scale Improvements
One feature of vSAN Stretched Cluster which was not fully optimized was the recovery process after a failure. On failure, all virtual machines from the failing site are restarted on the remaining site by vSphere HA. When the failing site recovers, DRS affinity rules kicked in, and the virtual machines which had affinity to the recovered site get migrated back to their preferred site. However, it may be that the virtual machine disks are still recovering/resyncing after the outage after the VMs are back on the recovered site. This means that they still have to do their read operations across the inter-site link, reading from the latest copy of the data that is on the remote site. This could have an adverse affect on performance, obviously, until the resync operation completes and all I/O can be done on the local site. With this 7.0U2 release, DRS now waits for the VM disks to re-synchronize on the failed, but now recovered, site before moving the virtual machines back there. This is a nice enhancements, and something that I have been looking forward to seeing in the product.
Another common request has been to increase the number of nodes that can be deployed on each data site in a vSAN Stretched Cluster. In 7.0U2, this has been increased from the long standing 15+15+1 (15 nodes at each data site, plus one witness at the witness site) to 20+20+1, meaning now we can support 40 nodes in a vSAN Stretched Cluster. I know a lot of our customers have been waiting for this scale increase.
vSAN File Services Interoperability and Scale Improvements
Many of our customers have been asking for the ability to enable vSAN File Services on both vSAN Stretched Cluster and 2-node vSAN configurations. In the 7.0U2 release, we can now support such configurations. I will add one caveat, and this is in relation to Kubernetes and Cloud Native Storage. At this point in time, we still do not have a support statement for running Tanzu Kubernetes (vSphere with Tanzu, TKGS, TKGm) on vSAN Stretched Cluster, which means that, currently, there is also no support for CNS on vSAN Stretched Cluster. Thus, it is not possible for Kubernetes distributions to consume RWX volumes from a vSAN File Services deployment running on vSAN Stretched Cluster at the time of writing this article. This is something that is currently being investigated.
Another very interesting feature in vSAN File Services is snapshot support for file shares. This will allow third party backup vendors to integrate with vSAN File Services, through APIs, to provide a backup service for the file shares. In the past, backups of vSAN File Services file shares had to be done via the VM or host that was mounting the share remotely.
Finally, on the scale front, vSAN File Services can now create 100 file shares per cluster, up from the 64 file shares limit found in previous releases.
vSphere Proactive HA Support
Proactive HA is interesting from a vSAN perspective. In a nutshell, Proactive HA can integrate with the likes of DELL OpenManage and HP OneView so that hardware errors can be detected, and proactive steps can be taken to avoid downtime, such as doing a host evacuation and moving all VM workloads to another host or hosts in the cluster. Proactive HA introduced a new state for ESXi hosts called “Quarantine mode”. This is different to “Maintenance mode” because the host’s resources can continue to be used when the host is in quarantine mode, but no new workloads will be place on the host. vSAN was unaware of this mode in the past, so that even when a host was in “Quarantine mode”, vSAN would continue to use the physical storage on the quarantined host for virtual machine objects. In 7.0U2, vSAN now supports Proactive HA, and understands these different host states.
vSphere Native Key Provider
As I am sure you are aware, vSAN supports encryption, and has done for some time. This is achieved using encryption keys, where each disk in encrypted with its own key. These keys are then, in turn, encrypted. In order to implement encryption, a Key Management Server of (KMS) from a third party was required, which was an added burden on our customers who wished to use the encryption service. In this 7.0U2 release, to coincide with the release of a Native Key Provider or embedded KMS in vSphere, vSAN can now leverage this Key Provider for encryption rather than requiring customers to purchase and license a third-party KMS.
Health Check History
The health check history is an extremely useful feature as it allows you to go back in time to see if there were any health issues that may have been transient, but have since recovered. For example, maybe there was some unexplained event over the weekend, but when you check on Monday morning, the health is all green. This feature enables you to go back and check if something did indeed occur that may be the root cause of an issue.
Health check history can be enabled very simply under Cluster > Monitor > vSAN > Skyline Health. Once the feature is enabled, you can add a date range to check to see if there were any events in that range. Then simply select the health check that you wish to examine, and select a place on the health check timeline to reveal the issues that occurred at that particular time, as shown below. Periods where the same event is present during the sampling period are collapsed to make the UI easier to navigate. You can click on these items to expand and collapse them.
Performance Top Contributors
Another really useful addition to the vSAN management and monitoring toolkit is the ability to look at the top contributor from a resource usage perspective. In the Cluster > Monitor > vSAN > Performance view, where it defaults to Cluster level metrics, simply select Top Contributors from the drop-down menu, and then click on the cluster chart to select a time point to view, as shown below. The default view of Top Contributors is VMs, where you can view based on IOPS, Throughput or Latency. This view can also be changed to Disk Group so that you can see which disk groups are more busy, or more constrained, than others.
Another really cool feature from a management and monitoring perspective is the inclusion of additional diagnostic information around networking. By navigating to an ESXi host in a vSAN cluster, then selecting Monitor > vSAN > Performance > Physical Adapters, there are a bunch of new metrics and counters that look at such as Port Drop Rate, RX CRC Error, TX Carrier Errors and so on. Since vSAN is a distributed system, the network plays a critical role and having some metrics that are continuously capturing the state of the network will be extremely beneficial for troubleshooting and diagnosis.
As you can see, there are a considerable number of additional features in this release of vSAN 7.0U2. The above list is not comprehensive. However, I hope the list of items above gives you some appreciation of the enhancements that are available in this latest release of vSAN.