Persistent Volume Placement in HCI-Mesh deployments

One of the new features introduced in vSphere 7.0U1 is HCI-Mesh, the ability to remotely mount vSAN datastores between vSAN clusters managed by the same vCenter Server. My buddy and colleague Duncan has done a great write-up on this topic on his yellow-bricks blog. In this post, I am going to look at how to address the situation of selecting the correct vSAN datastore when provisioning Kubernetes Persistent Volumes in an environment which uses HCI-Mesh. This will address the support statement in the vSAN HCI-Mesh Tech-Note that states that the following use case is not supported: Remote provisioning workflows for File Services, iSCSI, or CNS based block volume workloads (they can exist locally, but not be served remotely).

Let’s start with why this situation needs additional consideration. Let’s assume that there is a vSphere cluster that have vSAN enabled, and thus this cluster now have a ‘local’ vSAN datastore available. Now let’s assume there is a second vSphere cluster in the data center, which also has vSAN enabled. We can now go ahead and mount this ‘remote’ vSAN datastore to our cluster so that our cluster now has a local vSAN datastore and a remote vSAN datastore available. These screenshots may help to visualize it. In our lab, the cluster “Cormac” has a local vSAN datastore as well as access to the remote “Duncan” vSAN datastore.

Similarly, the cluster “Duncan” has a local vSAN datastore as well as access to the remote “Cormac” vSAN datastore.

Now, on cluster “Cormac”, I have also enabled vSphere with Tanzu using the HA-Proxy Load Balancer. I have proceeded with the building of a namespace to allocate some resources, and within that namespace, I have gone ahead and created a TKG Service “guest” cluster with one control plane and 2 worker VMs. This is all visible in the vSphere UI.

Now I come to the crux of the issue. In my namespace, I select with Storage Policies to make available as Storage Classes for my Kubernetes Persistent Volumes. These become available in both the Supervisor cluster (not relevant here as there is no PodVM support with the HA-Proxy) and in the TKG “guest” cluster. However, if I select a normal vSAN Storage Policy, such as the vSAN Default Storage Policy, the requirements in this policy will be satisfied by both the “local” vSAN datastore and the “remote” vSAN datastore mounted to the cluster “Cormac”. The result is that even though my TKG “guest” cluster Pods are provisioned on the TKG cluster worker nodes on cluster “Cormac”, the PVs could be provisioned on either the “local” vSAN datastore, or the “remote” vSAN datastore, as an be seen here. My PVs have been provisioned on the remotely mounted vSAN datastore from the cluster “Duncan”.

So how can we control this, and avoid this situation, and guarantee that the Persistent Volumes created using vSAN policies ensure that the PV is provisioned on the “local” vSAN datastore in HCI-Mesh? Quite simply, we build a new vSAN policy that includes a “tag” that matches the desired (local) vSAN datastore. We will do that next.

Please note, as per the HCI Mesh Tech Note, remote provisioning workflows for CNS based block volume workloads are not supported (they can exist locally, but not be served remotely). This procedure will address this unsupported issue.

Creating a vSAN Policy to include a Tag

To begin, we will create a new tag category which I have called HCI-Mesh.

In this Tag category, I have created 2 tags. One will be used to tag the vSAN datastore on the “Cormac” cluster, and the other will be used to tag the vSAN datastore in the “Duncan” cluster.

Next step is to assign the tag to the appropriate datastore. Select the datastore from the vSphere UI, and under Actions, choose Tags & Custom Attributes followed by Assign Tag. Select the appropriate tag from the HCI-Mesh category created earlier.  The tag should now be visible in the Summary view of the datastore:

We are now ready to build the policy. There is nothing new here, except the fact that we are also selecting a Tag to be included in the vSAN rules. Let’s create the policy so it becomes even clearer how we are consuming the Tags within the policy. I am going to build a rule that ensures that when it is selected for provisioning, the object ends up on the local vSAN datastore in the cluster “Duncan”:

In the Policy structure step, you may select rules for vSAN storage, as well tag based placement rules. However, we will also see that tag rules are available for selection in the vSAN section.

We first handle the vSAN rules. I am simply going to leave all of these at the default RAID-1 protection level. As mentioned, note that there is also a Tags section here, alongside Availability and Advanced Policy Rules.

From the Tags section, we can choose a category and tag to include in the policy. This is the Category and Tag that we created earlier. The Usage option is to “use storage tagged with” and then our desired tag.

Or, alternatively, if you do not choose to add it directly in the vSAN section, you can do the same step in the Tag based placement section that appears next.

Whichever way that you decide to do it, it really doesn’t matter as long as the vSAN datastore that you want to be selected with this rule shows up as the only compatible one in the Storage compatibility view:

Looks good. Now we can review and finish the creation of the policy.

Mapping Storage Policy to Storage Class

I have created an identical policy for the “local” vSAN datastore on the cluster “Cormac”. My next step is to deploy an application that uses this new Storage Policy via a Kubernetes Storage Class. To make this policy available as a Storage Class, I need to associate the policy with my vSphere with Tanzu namespace. To do that, I simply edit my storage policies and select the new Tag+vSAN policy created previously. Now I have 2 policies associated with my namespace, the default vSAN one and the new tag-based one.

This policy should now appear as a Storage Class when I log into my namespace context.

% kubectl config get-contexts
CURRENT   NAME                  CLUSTER          AUTHINFO                                         NAMESPACE
          10.202.112.152        10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local
*         cormac-ns             10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local   cormac-ns


% kubectl get sc
NAME                          PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
default-local-to-cormac       csi.vsphere.vmware.com   Delete          Immediate           true                   16h
vsan-default-storage-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   17

And now if Iogin to my TKG “guest” cluster context, I should also see the Storage Class available there too.

% kubectl config get-contexts
CURRENT   NAME                  CLUSTER          AUTHINFO                                         NAMESPACE
          10.202.112.152        10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local
          cormac-ns             10.202.112.152   wcp:10.202.112.152:administrator@vsphere.local   cormac-ns
*         tkg-cluster-1-18-5    10.202.112.153   wcp:10.202.112.153:administrator@vsphere.local

% kubectl get sc
NAME                          PROVISIONER              RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
default-local-to-cormac       csi.vsphere.vmware.com   Delete          Immediate           true                   16h
vsan-default-storage-policy   csi.vsphere.vmware.com   Delete          Immediate           true                   17h

Looks good. Now if I modify my applications to use this StorageClass, any persistent volumes that are created should end up on the vSAN datastore local to the cluster “Cormac”. Here is a snippet from my Cassandra statefulset which deploys 3 replicas, and thus should create 3 Pods, each with its own PV.

  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
      annotations:
        volume.beta.kubernetes.io/storage-class: default-local-to-cormac
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

And after deployment, I can now see that the 3 PVs have indeed been placed on the vSAN datastore that is local to the cluster “Cormac”.

Looks good. Using a combination of Tag + vSAN policies, we are able to create a storage class which correctly chooses a local vSAN datastore in a HCI-Mesh configuration for deployment of Persistent Volume when vSphere with Tanzu and TKG “guest” clusters are deployed in one or more of the vSphere clusters in the same environment.