Cloud Native Storage
If I select one of the PVs from the vSphere client, I can get more detailed information about both the PV and the PVC, the persistent volume claim. If this is all new to you, have a read of this K8s 101 primer I wrote.
Should one need to get more details about the underlying object, such as the state of its components on the vSAN datastore, an admin can simply click on the volume name from this view, and it will take them to the virtual objects view. At a single glance, you can tell which K8s worker node virtual machine that the persistent volume is attached to. In this case, it is k8s-worker2.
From here an admin can see the make up of the VMDK/PV object on vSAN by clicking on ‘View Placement Details‘. This will give me a break down of the components that back the object. For example, my PV is a RAID-1 object, so there will be two mirrored data components and a witness component, as shown here.
I can see the hypervisor host and the physical disk devices where the component resides. This will make it extremely easy for the vSphere/vSAN admin to home in on the status and health of individual persistent volumes, should there be a storage query from the K8s developer team.
By the way, we are conducting a survey to understand where customers are in their Cloud Native Storage journey. We’d be very grateful if you could complete this survey.
Let’s now turn our attention to the second area of focus in the 6.7U3 release of vSAN, namely Intelligent Operations. I won’t cover all of the improvements in this post, but I will call out the ones that I feel are significant.
Improved Capacity Usage
First, we have improved and simplified how we display capacity usage of the vSAN datastore, including the ability to see how much capacity is being consumed by block container volumes such as K8s persistent volumes. Here is a sample screenshot taken from my system.
The Usable Capacity Analysis is also useful, and is something that a number of customers have been requesting as it allows you to see how much effective free space is available on your vSAN datastore, depending on your choice of storage policy.
Improved Resync Visibility
Another common request was to have more granular insight into resynchronization activity. With vSAN 6.7U3, we can now observe not only the active resyncs, but also those items that might be queued up waiting to commence resyncing. This will give a far better idea about how long currently resync activity will take to complete. We can also see the reason for the resync. In the example shown here, it looks like there was a policy chance on the objects, meaning that the reason for the resync is compliance.
Maintenance Mode /Data Migration Pre-Check
This is a feature that I am really happy to see included, and something that we have been requesting for some time. In the past, customer have been unaware of some underlying activities taking place on the vSAN cluster. These activities could be something as simple as an already existing resync activity. Thus, some objects may already be unprotected due to earlier activity or failures in the cluster. However, a customer could still unknowingly place a host in the cluster into maintenance mode, and this may have been the host with the last good component of an object. Not good, especially if ‘No data migration’ was selected as you would impact the availability of the object and thus the virtual machine.
Another common complaint was that customers could choose full data evacuation, and move every component off of a host, even if there was not enough space left on the remaining nodes of the cluster to accommodate these components. The end result is that you could fill the vSAN datastore.
What we now have in 6.7U3 is the maintenance mode pre-check which will check both object compliance and accessibility, as well as cluster capacity, in the event of a host being placed into maintenance mode. In the following example, I requested a pre-check on host esxi-dell-e.rainpole.com before I placed it into maintenance mode. This is what it reported in the areas of Object Compliance, Cluster Capacity and Predicted Health.
Object Compliance and Accessibility
What we can deduce from this is that quite a number of objects will become non-compliant if we place this host into maintenance mode with the option ‘Ensure accessibility’. However, there is enough capacity for us to do this if we wish to proceed. A very nice additional to vSAN day 2 operations.
Native Support for WSFC
While we have been able to support WSFC (Windows Server Failover Clusters) on vSAN for some time through the use of iSCSI, this is a significant improvement over that implementation. vSAN 6.7U3 can now support SCSI-3 persistent group reservations (PGR) natively, which in turn means that a VMDK residing on vSAN can now be used as the shared quorum disk for WSFC implementations. I know of a number of customers who were looking for this functionality, and I’m delighted that we can now officially announce it. Do note however that this is for virtualized WSFC environments. If you are using physical servers, you should continue to use the vSAN iSCSI service method to share a quorum disk. There are a number of other caveats to take into account as well when a VMDK is shared via this SCSI-3 PGR mechanism. Please refer to the official documentation for full details.
iSCSI LUN Resize Support
I know that this has been a pet peeve for a number of customers. Prior to 6.7U3, you could not resize a LUN that was presented via the vSAN iSCSI service. With 6.7U3, we now support the resize/growing of an iSCSI LUN without the need to take it offline. vSAN will take care of any quiescing of the I/O while the resize operation is in progress.
Enhanced Performance and Availability
In this final section, I was to touch on a few performance improvements that we have made in this release. Our engineering teams have made significant changes to the destaging mechanism between cache and capacity layers as well as to adaptive throttling. These improvements are in the Log Structured Object Manager (LSOM), a key component of vSAN architecture. In the past, destaging during high ingestion of writes could impact the latency of running workloads under heavy I/O, especially when deduplication and compression are turned on. This enhancement means that workloads running on vSAN should have improved performance in terms of predictable I/O latencies and increased sequential I/O throughput.
Also of note is vSAN’s new ability in 6.7U3 to run dozen of resync streams per component in parallel. This feature will automatically adapt depending on the current state, and available resources of the system. This new mechanism will provide vSAN the opportunity to do faster rebuilds of components.
This is a cool new feature, and can be used on an esxi host at the command line to examine performance. I took a quick screenshot of the different entities that can be view (use capital E to get the list) using vsantop. I’ve not done much with it yet, but it seems like it could be useful for troubleshooting performance related issues.
vSAN Performance Monitor fling
I’ll close with a feature that is not directly related to the 6.7U3 release, but you might find it useful all the same. The vSAN Performance Monitor is a fling released by our engineering team to periodically gather vSAN performance metrics via a telegraf agent, and then uses Grafana to visualize them. I just deployed the fling earlier today, and must say that it is really easy to get started. If you were someone who utilized the older vSAN Observer graphs via RVC (Ruby vSphere Console) in the past, then I think you will like what you see here. Here’s a quick view of the vSAN Summary tab from my vSAN 6.7U3 environment.
As you can see, there is a lot of compelling new enhancements made to vSAN 6.7U3. Check out the release notes for a full list of enhancements.