Kubernetes Storage on vSphere 101 – ReadWriteMany NFS
 Over the last number of posts, we have spent a lot of time looking at persistent volumes (PVs) instantiated on some vSphere back-end block storage. These PVs were always ReadWriteOnce, meaning they could only be accessed by a single Pod at any one time.  In this post, we will take a look at how to create a ReadWriteMany volume, based on an NFS share, which can be accessed by multiple Pods. To begin, we will use an NFS server image running in a Pod, and show how to mount the exported file share to another Pod, simply to get the concepts across. After that, we will show how to consume an external NFS file share/export from an external NAS (Network Attached Storage) device. So lets begin with the NFS Server implementation.
Over the last number of posts, we have spent a lot of time looking at persistent volumes (PVs) instantiated on some vSphere back-end block storage. These PVs were always ReadWriteOnce, meaning they could only be accessed by a single Pod at any one time.  In this post, we will take a look at how to create a ReadWriteMany volume, based on an NFS share, which can be accessed by multiple Pods. To begin, we will use an NFS server image running in a Pod, and show how to mount the exported file share to another Pod, simply to get the concepts across. After that, we will show how to consume an external NFS file share/export from an external NAS (Network Attached Storage) device. So lets begin with the NFS Server implementation.
NFS Server
In this example, I assigned a persistent volume (PV) to my NFS server Pod, so that volume could be exported out as an NFS file share. In the first 101 blog post, I described in detail how to do that, so I won’t go through the process here again. However, to do this, I have individual StorageClass, StatefulSet and Service YAML manifests. I could put them all in one manifest file if I wish, but here they are individually. I based this in the NFS server configuration found on here on github. At this point, most of this should be familiar to you.
$ cat nfs-server-sts.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: nfs-server namespace: nfs labels: app: nfs-server spec: serviceName: nfs-service replicas: 1 selector: matchLabels: app: nfs-server template: metadata: labels: app: nfs-server spec: containers: - name: nfs-server image: gcr.io/google_containers/volume-nfs:0.8 ports: - name: nfs containerPort: 2049 - name: mountd containerPort: 20048 - name: rpcbind containerPort: 111 securityContext: privileged: true volumeMounts: - name: nfs-export mountPath: /exports volumeClaimTemplates: - metadata: name: nfs-export annotations: volume.beta.kubernetes.io/storage-class: nfs-sc spec: accessModes: [ "ReadWriteOnce" ] resources: requests: storage: 5Gi $ cat nfs-server-sc.yaml kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: nfs-sc provisioner: kubernetes.io/vsphere-volume parameters: diskformat: thin storagePolicyName: raid-1 datastore: vsanDatastore $ cat nfs-server-svc.yaml apiVersion: v1 kind: Service metadata: labels: app: nfs-server name: nfs-server namespace: nfs spec: clusterIP: ports: - name: nfs port: 2049 - name: mountd port: 20048 - name: rpcbind port: 111 selector: app: nfs-server
The first question is why are we using a ‘StatefulSet’. This means that if the Pod fails, it will get restarted. However, if the Pod does restart, then it will get a new IP address. This is why we need a service – a service will give us that single point of reference to communicate to the NFS server from any clients that wish to mount the share.
How this works is that the NFS server Pod will have a 5GB PV mounted. This PV is a dynamically provisioned VMDK on the vSAN datastore. This will be mounted on /exports, which is the same folder that is going to be shared by the NFS server (automatically configured to do so). Access to this volume will be via the IP address defined in the service. Since this is using a blank ClusterIP, Pods will be able to communicate to each other using the service IP rather than communicating directly to the NFS server Pod IP, but there is no EXTERNAL network access (this reminds me, I should probably do a simple 101 ‘service’ post at some point as well). Let’s now go ahead and roll out our server, whilst monitoring PVs, PVCs and Pods as we do so. Then we will login to the NFS server Pod and verify that the share is indeed being exported.
$ ls nfs-server-sc.yaml nfs-server-sts.yaml nfs-server-svc.yaml $ kubectl create -f nfs-server-sc.yaml storageclass.storage.k8s.io/nfs-sc created $ kubectl get sc NAME PROVISIONER AGE nfs-sc kubernetes.io/vsphere-volume 3s $ kubectl create -f nfs-server-svc.yaml service/nfs-server created $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nfs-server ClusterIP 10.100.200.119 <none> 2049/TCP,20048/TCP,111/TCP 4s $ kubectl create -f nfs-server-sts.yaml statefulset.apps/nfs-server created $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-c207b810-8e81-11e9-b070-005056a2a261 5Gi RWO Delete Bound nfs/nfs-export-nfs-server-0 nfs-sc 14m $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nfs-export-nfs-server-0 Bound pvc-c207b810-8e81-11e9-b070-005056a2a261 5Gi RWO nfs-sc 14m $ kubectl get pod NAME READY STATUS RESTARTS AGE nfs-server-0 1/1 Running 0 87s $ kubectl exec -it nfs-server-0 -- /bin/bash [root@nfs-server-0 /]# exportfs /exports <world> / <world> [root@nfs-server-0 /]# mount | grep export /dev/sdd on /exports type ext4 (rw,relatime,data=ordered) [root@nfs-server-0 /]# df /exports Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdd 5029504 10236 5002884 1% /exports [root@nfs-server-0 /]# grep nfs /etc/hosts 10.200.100.13 nfs-server-0.nfs-service.nfs.svc.cluster.local nfs-server-0 [root@nfs-server-0 /]# exit exit $
NFS Client
There server side all looks good. Now that we have a service IP address for the NFS service, we can use that in our client configuration. Lets look at the client YAML manifest next, which is running a very simple busybox image. First, we have a PV, then a PVC and finally the client Pod YAML. Note that the service class is not a real service class once more; it is simply used to match the PV and the PVC. However, the PV access mode is now ReadWriteMany, implying it can be mounted by multiple Pods. In the client PV, we need to add the FQDN or IP address of the NFS server service created earlier. The client Pod should then automatically mount the /exports share from the NFS server onto its local mount point /nfs if everything works as expected.
$ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nfs-server ClusterIP 10.100.200.119 <none> 2049/TCP,20048/TCP,111/TCP 12m $ cat nfs-client-pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: nfs-client-pv spec: storageClassName: nfs-client-sc capacity: storage: 1Gi accessModes: - ReadWriteMany nfs: server: "10.100.200.119" path: "/exports" $ cat nfs-client-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfs-client-pvc spec: storageClassName: nfs-client-sc accessModes: - ReadWriteMany resources: requests: storage: 1Gi $ cat nfs-client-pod-1.yaml apiVersion: v1 kind: Pod metadata: name: nfs-client-pod-1 spec: containers: - name: busybox image: "k8s.gcr.io/busybox" volumeMounts: - name: nfs-vol mountPath: "/nfs" command: [ "sleep", "1000000" ] volumes: - name: nfs-vol persistentVolumeClaim: claimName: nfs-client-pvc $ kubectl create -f nfs-client-pv.yaml persistentvolume/nfs-client-pv created $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE nfs-client-pv 1Gi RWX Retain Available nfs-client-sc 3s pvc-c207b810-8e81-11e9-b070-005056a2a261 5Gi RWO Delete Bound nfs/nfs-export-nfs-server-0 nfs-sc 25m $ kubectl create -f nfs-client-pvc.yaml persistentvolumeclaim/nfs-client-pvc created $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE nfs-client-pvc Bound nfs-client-pv 1Gi RWX nfs-client-sc 6s nfs-export-nfs-server-0 Bound pvc-c207b810-8e81-11e9-b070-005056a2a261 5Gi RWO nfs-sc 26m $ kubectl create -f nfs-client-pod-1.yaml pod/nfs-client-pod-1 created $ kubectl get pod NAME READY STATUS RESTARTS AGE nfs-client-pod-1 1/1 Running 0 6s nfs-server-0 1/1 Running 0 13m
Ok – everything looks good at this point. The NFS client Pod, which we have configured to automatically mount the NFS server /exports onto local mount point /nfs, has successfully started. Let’s login and see if that is the case.
$ kubectl exec -it nfs-client-pod-1 /bin/sh / # mount | grep nfs 10.100.200.119:/exports on /nfs type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,\ proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.100.200.119,mountvers=3,mountport=20048,\ mountproto=tcp,local_lock=none,addr=10.100.200.119) / # cd /nfs /nfs # ls index.html lost+found /nfs # touch file-created-from-client-1 /nfs # ls file-created-from-client-1 index.html lost+found /nfs #
Since this volume is a ReadWriteMany share, allowing access to multiple clients, we should also be able to launch another NFS client, using the same PVC, and access the same share
$ cat nfs-client-pod-2.yaml apiVersion: v1 kind: Pod metadata: name: nfs-client-pod-2 spec: containers: - name: busybox image: "k8s.gcr.io/busybox" volumeMounts: - name: nfs-vol mountPath: "/nfs" command: [ "sleep", "1000000" ] volumes: - name: nfs-vol persistentVolumeClaim: claimName: nfs-client-pvc $ kubectl create -f nfs-client-pod-2.yaml pod/nfs-client-pod-2 created $ kubectl get pods NAME READY STATUS RESTARTS AGE nfs-client-pod-1 1/1 Running 0 17m nfs-client-pod-2 1/1 Running 0 23s nfs-server-0 1/1 Running 0 31m $ kubectl exec -it nfs-client-pod-2 /bin/sh / # mount | grep nfs 10.100.200.119:/exports on /nfs type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,\ proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.100.200.119,mountvers=3,mountport=20048,\ mountproto=tcp,local_lock=none,addr=10.100.200.119) / # df -h /nfs Filesystem Size Used Available Use% Mounted on 10.100.200.119:/exports 4.8G 10.0M 4.8G 0% /nfs / # cd /nfs /nfs # ls file-created-from-client-1 index.html lost+found /nfs # touch file-created-from-client-2 /nfs # ls file-created-from-client-1 file-created-from-client-2 index.html lost+found /nfs #
This looks like it is working successfully.
External NFS Access
Update: When I first this this test with flannel, I was under the mistaken impression that I needed to create a Service to allow the client Pods communicate with the external NFS server. After further testing with client Pods deployed on PKS using NSX-T, it seems that there is no need for an external Load Balancer server. The NFS Client Pods can route to the NFS server with out a service. Thanks to reddit user dmnt3d for highlighting this.
Therfore, to access an external NFS share from your Pods, it is simply a matter of creating new client PVC/PV YAML files which have the new export information, such as the following. My NAS filer is exporting file shares via the IP address 10.27.51.71 and the name of the export is /share1.
$ cat vsan-nfs-client-pv.yaml apiVersion: v1 kind: PersistentVolume metadata: name: vsan-nfs-client-pvc spec: storageClassName: vsan-nfs-client-sc capacity: storage: 1Gi accessModes: - ReadWriteMany nfs: server: "10.27.51.71" path: "/share1" $ cat vsan-nfs-client-pvc.yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: vsan-nfs-client-pvc spec: storageClassName: vsan-nfs-client-sc accessModes: - ReadWriteMany resources: requests: storage: 1Gi $ cat vsan-nfs-client-pod-3.yaml apiVersion: v1 kind: Pod metadata: name: nfs-client-pod-3 spec: containers: - name: busybox image: "k8s.gcr.io/busybox" volumeMounts: - name: nfs-vol mountPath: "/nfs" command: [ "sleep", "1000000" ] volumes: - name: nfs-vol persistentVolumeClaim: claimName: vsan-nfs-client-pvc
Once the PV/PVC and Pod have been successfully created, we can login to the Pod and check if the export has indeed been mounted on the client.
$ kubectl exec -it nfs-client-pod-3 /bin/sh / # mount | grep nfs 10.27.51.71:/share1 on /nfs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,\ retrans=2,sec=sys,clientaddr=10.27.51.76,local_lock=none,addr=10.27.51.71) / # df -h /nfs Filesystem Size Used Available Use% Mounted on 10.27.51.71:/share1 9.5T 349.0M 9.5T 0% /nfs / # cd /nfs /nfs # ls VMware-VMvisor-Installer-6.8.9-13958501.x86_64.iso /nfs #
That appears to be working well. Now one interesting issue that I did come across was the following. I did my initial testing on PKS, the Pivotal Container Service. This worked seamlessly. However, when I deployed my NFS manifests on a new distribution of K8s 1.14.3 on Ubuntu 18.04, I got the following errors on trying to mount NFS shares on the client:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 12s default-scheduler Successfully assigned default/nfs-client-pod-1 to cor-k8s-w01 Warning FailedMount 12s kubelet, cor-k8s-w01 MountVolume.SetUp failed for volume "nfs-client-pvc" : mount failed: exit status 32 Mounting command: systemd-run Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc\ --scope -- mount -t nfs 10.111.227.6:/exports /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc Output: Running scope as unit: run-r51ccde917c344fdf9478ab35f0be14b2.scope mount: /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc: bad option; for several filesystems (e.g. nfs, cifs)\ you might need a /sbin/mount.<type> helper program. Warning FailedMount 12s kubelet, cor-k8s-w01 MountVolume.SetUp failed for volume "nfs-client-pvc" : mount failed: exit status 32
It turned out that my newly deployed Ubuntu node VMs did not have the nfs-common package installed, whereas the Ubuntu 16.04 node VMs on PKS did. After installing the package on the nodes (using apt-get install nfs-common), further NFS mounts then worked seamlessly.
Manifests used in this demo can be found on my vsphere-storage-101 github repo.