Failed to deploy PV to local volume – “No compatible datastore found for storagePolicy”

This is something that I “spun my wheels” on a little bit last week, so I decided I’d write a short article to explain the issue in a bit more detail. This is related to the provisioning of a Persistent Volume on the Supervisor cluster of a vSphere with Kubernetes deployment. I had a local VMFS volume on one of my hosts, so I went ahead and tagged the volume using vSphere Tagging. I then built a tag-based storage policy so that when that policy is selected for provisioning, the objects that get provisioned would be placed on that local, tagged VMFS volume. I tested it with a virtual machine, and it worked just fine. So then I moved to my vSphere with Kubernetes environment, selected my namespace, added the policy to the namespace, and observed that when I logged in at the command line, the policy had been instantiated as a StorageClass as expected. I then went ahead and built a simple PersistentVolumeClaim (PVC) manifest to request the creation of a PV using this StorageClass. This request failed, and a describe of the PVC showed the following error:

$ kubectl describe pvc local-pvc
Name: local-pvc
Namespace: cormac-ns
StorageClass: local-vmfs
Status: Pending
Volume:
Labels: supervisor=true
Annotations: kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"labels":{"supervisor":"true"},"name":"local-pvc","namespac...
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Mounted By: <none>
Events:
  Type Reason Age From Message
  ---- ------ ---- ---- -------
  Warning ProvisioningFailed 7m31s csi.vsphere.vmware.com_4216cd4619c10fbf96e48deaa2ed8228_34dfa5fa-bf6e-11ea-ba57-005056966307 \
failed to provision volume with StorageClass "local-vmfs": rpc error: code = Internal desc = \
Failed to create volume. Error: failed to create cns volume. createSpec: "(*types.CnsVolumeCreateSpec)\
(0xc0004f61c0)({\n DynamicData: (types.DynamicData) {\n },\n Name: (string) (len=40) \
\"pvc-864e37ab-9bd6-4e8e-bf09-e97f45e8dec4\",\n VolumeType: (string) (len=5) \"BLOCK\",\n Datastores: \
([]types.ManagedObjectReference) (len=2 cap=2) {\n (types.ManagedObjectReference) Datastore:datastore-18,\n \
(types.ManagedObjectReference) Datastore:datastore-38\n },\n Metadata: (types.CnsVolumeMetadata) {\n DynamicData:\ 
(types.DynamicData) {\n },\n ContainerCluster: (types.CnsContainerCluster) {\n DynamicData: (types.DynamicData)\
 {\n },\n ClusterType: (string) (len=10) \"KUBERNETES\",\n ClusterId: (string) (len=9) \"domain-c8\",\n \
VSphereUser: (string) (len=78) \"VSPHERE.LOCAL\\\\workload_storage_management-c1b292b5-c77a-40f1-b90d-8763c991151a\",\n\
 ClusterFlavor: (string) (len=8) \"WORKLOAD\"\n },\n EntityMetadata: ([]types.BaseCnsEntityMetadata) <nil>,\n\
 ContainerClusterArray: ([]types.CnsContainerCluster) (len=1 cap=1) {\n (types.CnsContainerCluster) {\n DynamicData:\
 (types.DynamicData) {\n },\n ClusterType: (string) (len=10) \"KUBERNETES\",\n ClusterId: (string) (len=9) \
\"domain-c8\",\n VSphereUser: (string) (len=78) \"workload_storage_management-c1b292b5-c77a-40f1-b90d-8763c991151a@vsphere.local\",\n\
 ClusterFlavor: (string) (len=8) \"WORKLOAD\"\n }\n }\n },\n BackingObjectDetails: (*types.CnsBlockBackingDetails)(0xc000a098e0)\
({\n CnsBackingObjectDetails: (types.CnsBackingObjectDetails) {\n DynamicData: (types.DynamicData) {\n },\n CapacityInMb: (int64)\
 1024\n },\n BackingDiskId: (string) \"\"\n }),\n Profile: ([]types.BaseVirtualMachineProfileSpec) (len=1 cap=1) \
{\n (*types.VirtualMachineDefinedProfileSpec)(0xc000491700)({\n VirtualMachineProfileSpec: (types.VirtualMachineProfileSpec)\
 {\n DynamicData: (types.DynamicData) {\n }\n },\n ProfileId: (string) (len=36) \"fe19e988-5319-4528-9931-1498c6017f20\",\n\
 ReplicationSpec: (*types.ReplicationSpec)(<nil>),\n ProfileData: (*types.VirtualMachineProfileRawData)(<nil>),\n \
ProfileParams: ([]types.KeyValue) <nil>\n })\n },\n CreateSpec: (types.BaseCnsBaseCreateSpec) <nil>\n})\n", \
fault: "(*types.LocalizedMethodFault)(0xc00085fd60)({\n DynamicData: (types.DynamicData) {\n },\n Fault: \
(types.CnsFault) {\n BaseMethodFault: (types.BaseMethodFault) <nil>,\n Reason: (string) (len=85) \
\"No compatible datastore found for storagePolicy: fe19e988-5319-4528-9931-1498c6017f20\"\n },\n \
LocalizedMessage: (string) (len=101) \"CnsFault error: No compatible datastore found for storagePolicy: \
fe19e988-5319-4528-9931-1498c6017f20\"\n})\n", opId: "c1919862"

So why is the local, tagged datastore incompatible? Local datastores are something I could certainly provision Persistent Volumes to on upstream Kubernetes, and the policy I created also worked with virtual machine provisioning. Why was it not working for PV creation on vSphere with Kubernetes? Eventually, someone pointed me to the following prerequisite in the Storage Policies section of the vSphere with Kubernetes documentation:

Make sure that the datastore you reference in the storage policy is shared between all ESXi hosts in the cluster.

On making some further inquiries into why we have this requirement in vSphere with Kubernetes, one of our CSI-CNS engineers highlight the fact that TKG (guest) clusters deployed in vSphere with Kubernetes are supported with vSphere DRS. This means that the control plane VMs and worker node VMs that make up the TKG cluster can be load balanced across the ESXi hosts on the cluster. For this reason, we do not want to enable the “pinning” of PVs in a worker node to the local storage of any host. This is in case we need to migrate the TKG node. This makes perfect sense, and this is the reason we need to have shared storage in vSphere with Kubernetes.

Hopefully that explains the reason sufficiently. I’m going to see if we can improve the documentation and make this requirement more visible. We might also be able to improve upon the error message. I’m working on that too.