VMware Explore 2022: What’s new in vSphere 8 & vSAN 8

VMware Explore 2022 kicked off this week. There are of course many announcements taking place across the whole suite of VMware products. In this post, I will focus primarily on the announcements related to the products that I work with on a regular basis. Those products are vSphere 8, vSphere Tanzu Standard (vSphere with Tanzu), and vSAN 8.

vSphere 8

In the vSphere 8 space, the most significant announcement in my opinion is the fact that we are delivering on Project Monterey. We got our first technical preview of Project Monterey back in 2020 by the VMware CTO, Kit Colbert. There were a considerable number of updates on Project Monterey at VMworld 2021, including the announcement of an early access program. This year, with the release of vSphere 8, VMware are announcing that there is now full support for DPUs (Data Processing Units), also known as SmartNICs, in the vSphere platform. vSphere 8 now provides a heterogeneous computing platform with support for CPUs, GPUs and DPUs. The official title for Project Monterey is vSphere Distributed Services Engine, which is basically an instance of ESXi running on ARM in the DPU.. This now gives us the ability to offload tasks to the DPU that have been historically associated with CPUs, e.g. I/O processing and the processing of network related infrastructure services. This should avoid CPU contention between application workloads and infrastructure workloads going forward, providing even more CPU resources to the applications. This should also facilitate a higher workload consolidation as well as a performance boost. This is because applications can now consume the CPU cores which are no longer needed to do infrastructure tasks. Another key take-away is the fact that these workloads, although they are offloading to DPUs, can continue to leverage core vSphere features such as DRS and vMotion for load balancing and availability purposes. This is achieved via support for the Universal Pass-Thru (UPT) feature available in vSphere 8. And the nice thing for a vSphere administrator is that the management and monitoring of the DPUs is built into the vSphere client. Speaking of DRS, the Distributed Resource Scheduler built into vSphere, it now takes into account memory bandwidth and latency requirements when doing workload placement. I’ll leave you with one final cool feature from an operational / lifecycle management perspective, and that is that vSphere administrators now have the ability to initiate ESXi upgrades across multiple hosts simultaneously in vSphere 8.

VMware Tanzu Standard (aka vSphere with Tanzu)

There are some significant updates to VMware Tanzu Standard (aka vSphere with Tanzu) platform and the Tanzu Kubernetes Grid Service for deploying Kubernetes clusters. The first major enhancement is Tanzu Kubernetes Grid 2.0. This release introduces support for highly available, multi-AZ deployments in VMware Tanzu Standard, and subsequent deployments of Kubernetes clusters. This allows Kubernetes clusters, including the Supervisor Cluster, to be deployed across different vSphere clusters in your infrastructure, providing high availability in the event of site failures. The second major enhancement is a new API version for TanzuKubernetesCluster (v1alpha3) to support multi-AZ deployments, as well as a brand new API for Kubernetes clusters to align with ClusterAPI (v1beta1). This is the first step on our journey to unify all Tanzu Kubernetes offerings on the vSphere platform. What this means is that for those customers who are familiar with the tanzu CLI to create Kubernetes clusters, you can now use the same tanzu CLI commands to build Kubernetes clusters using the TKG Service on the Supervisor cluster as an endpoint. There is also a new Harbor Image Registry Service coming in this release, replacing the earlier embedded Harbor Image Registry. This will allow vSphere with Tanzu customers to use more up-to-date Harbor Registry features. I will be following up with additional posts highlighting these enhancements in more detail over the coming days & weeks.

Before leaving vSphere 8, it is worth mentioned some of the additional announcements around vSphere+, the new infrastructure platform that VMware is building to offer the benefits of cloud to on-premises deployments. I’ve already published a write-up of vSphere+ and vSAN+ on this site to coincide with the original announcement, but at VMware Explore 2022 reveals details about the integration of VMware Aria Operations (formerly vRealize Operations or vROps) as a new service available in the Cloud Console. This is very interesting as you can now have a single management entity overseeing all of the performance, security, capacity and costs associated with your on-premises deployments. I ‘borrowed’ a screenshot from the VMware Explore 2022 demonstration to give you an idea of how the interface looks from the Cloud Console. Expect to see more services added regularly as we go forward as we build out ways to simplify all on-premises operations, from inventory overviews, lifecycle management, security hardening, etc. You can read more about VMware Aria Operations here.

vSAN 8 / Express Storage Architecture

Let’s switch focus to vSAN 8, where there are some major enhancements to talk about. Not only is there a range of improvements to the original storage architecture (OSA) of vSAN, but there is also a brand new next-gen storage architecture called the Express Storage Architecture (ESA). It is the vSAN ESA that will be the focus of this post.

The most significant enhancement to the ESA is the new, efficient data path. vSAN ESA introduces a new architecture that is optimized for NAND Flash with NVME as a protocol bus. This allows vSAN to improve its flash storage features, such as writing data more efficiently. The important thing to highlight is that the user experience is the same as before. If you are already one of the existing 30,000 vSAN customers, then the management, monitoring and life-cycle management via the vSphere client remains the same for vSAN ESA. However, this new architecture introduces quite a few significant enhancements which I will attempt to highlight here.

  • The vSAN ESA architecture introduces mechanism to improve how data is processed and stored. In the past, it was always a trade off when choosing between performance and space efficiency. With the changes to the internal algorithms, customers no longer have to choose between performance (RAID-1) and capacity (RAID-5/RAID-6). The vSAN ESA engineering and performance teams are reporting that this architecture provides space efficiency with little if any performance impact. In other words, customers can use RAID-6 with the performance of RAID-1.
  • A new single tier architecture in the vSAN ESA means that each device can now contribute to both cache and capacity. With the vSAN OSA (non-ESA) disk group feature, certain devices were dedicated to cache and other devices were dedicated to capacity. This meant that the failure domain was at the disk group level. Thus, certain device failures could impact the whole of a disk group. Device failures were also a consideration when certain space efficiency features such as deduplication and compression were enabled in the past. With this new architecture, the failure domain is much reduced. vSAN ESA now has improved availability in the event of a single device failure.
  • Data Services such as compression and encryption have also improved dramatically due to the new data path layout and the new single tier architecture. In the past, encryption had to take place at both the cache and again at the capacity tier. This is no longer necessary, and in vSAN ESA, encryption is only done once. Also, compression on the vSAN ESA has finer levels of granularity than before. Rather than checking to see if 50% or more compression could be achieved which is the traditional approach by vSAN OSA, vSAN ESA may now compress blocks even if the 50% cannot be achieved.
  • These performance improvements mean that new scalable snapshots are available. VMs running on vSAN ESA can now have large snapshot chains, while continuing to perform optimally. Consolidation time has also significantly improved, whilst snapshot overhead has dramatically reduced. This has been an issue for the longest time, so it is great to see a solution for it included in the vSAN ESA.
  • [Updated] The new architecture also introduced a new erasure coding feature called Adaptive RAID-5. The first thing to note is that RAID-5 moves from a 3+1 (3 data and 1 parity) to a 4+1 (4 data and 1 parity). 6 hosts are needed to implement the new RAID-5 in vSAN ESA. You might ask why we need 6 hosts to implement 4+1. The reason is because rebuild (N+1) is factored in. If there is a host failure in the vSAN cluster, there is an available host for recovery to maintain 4+1. However, when the failure exists for an extended period of time, the adaptive feature behaves as follows to maintain rebuild capability in the event of further failures. Assume a customer has implemented RAID-5 in a 6 hosts vSAN cluster. Now assume that the customer either removes a host from the cluster for maintenance or updating, or that there is a host failure. After a 24 hour period, if the cluster remains in this state (missing host), vSAN ESA will automatically switch the RAID level from a 4+1 to a 2+1 configuration. What this means is that once in this state with less than 6 nodes, there is still rebuild (N+1) available. So now even if two more hosts are removed from the cluster (or indeed have failures) after the original host was removed and there are now only 3 hosts in the cluster, there continues to be RAID-5 protection. When these missing or failed nodes are added back to the cluster, the original 4+1 erasure coding layout is automatically reinstated. What is also interesting about this approach is that for customers with smaller clusters, the 2+1 RAID-5 schema can also be implemented on those smaller sites. Therefore you can now implement RAID-5 on a 3 node vSAN ESA configuration.

That is only a snippet of the features introduced with the new vSAN ESA. Checks out Duncan’s excellent blog post on vSAN ESA for further details. Overall I think it is a very compelling story and a huge amount of engineering effort went into delivering this new architecture for vSAN. There are some requirements around device compatibility, CPU, memory and networking, so please check the vSAN ESA Ready Nodes compatibility list for supported servers.

I want to close this section by also mentioning vSAN+. This provides similar functionality to vSphere+. Check out my previous blog post for more details.

VMware Cloud Foundation+

Finally we come to VMware Cloud Foundation+ which is also extending the advantages of vSphere+ and vSAN+ to VCF. VCF+ is available with VCF v4.5, and presents all of the same benefits seen with the other cloud connected offerings. These benefits include a single cloud console to display the global inventory, as well as manage and monitor all of the on-premises VCF deployments. There are the added benefits of the new, streamlined upgrade mechanisms for vCenter Server. There is also access to cloud connected services such as VMware Cloud Disaster Recover for DRaaS and Ransomware Recovery. And of course, there is the new subscription model which simplifies license management.

As you can see, there are some very interesting announcements here for those customers with an interest in vSphere management, hyper-converged infrastructure. It should also be of interest to those customers responsible providing developer infrastructure such as Kubernetes on vSphere. Check out the VMware Explore 2022 site for further details and access to breakout recordings and solution keynotes.

2 Replies to “VMware Explore 2022: What’s new in vSphere 8 & vSAN 8”

  1. Hi Cormac, Adaptive RAID-5 means that 4+1 consume 125% of space and 2+1 consume 150%? What about Adaptive RAID-6 (if the name is right). We need to consider space increase/consumption when the cluster remains in this state (missing host) after 24h?

    Regards
    Benja

    1. I guess we feel that RAID-6 can already tolerate 2 failures, so the situation described as the use-case for adaptive R5 is already addressed.

Comments are closed.