Kubernetes Storage on vSphere 101 – ReadWriteMany NFS

Over the last number of posts, we have spent a lot of time looking at persistent volumes (PVs) instantiated on some vSphere back-end block storage. These PVs were always ReadWriteOnce, meaning they could only be accessed by a single Pod at any one time.  In this post, we will take a look at how to create a ReadWriteMany volume, based on an NFS share, which can be accessed by multiple Pods. To begin, we will use an NFS server image running in a Pod, and show how to mount the exported file share to another Pod, simply to get the concepts across. After that, we will show how to consume an external NFS file share/export from an external NAS (Network Attached Storage) device. So lets begin with the NFS Server implementation.

NFS Server

In this example, I assigned a persistent volume (PV) to my NFS server Pod, so that volume could be exported out as an NFS file share. In the first 101 blog post, I described in detail how to do that, so I won’t go through the process here again. However, to do this, I have individual StorageClass, StatefulSet and Service YAML manifests. I could put them all in one manifest file if I wish, but here they are individually. I based this in the NFS server configuration found on here on github. At this point, most of this should be familiar to you.

$ cat nfs-server-sts.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nfs-server
  namespace: nfs
  labels:
    app: nfs-server
spec:
  serviceName: nfs-service
  replicas: 1
  selector:
    matchLabels:
      app: nfs-server
  template:
    metadata:
      labels:
        app: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
          - name: nfs
            containerPort: 2049
          - name: mountd
            containerPort: 20048
          - name: rpcbind
            containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
        - name: nfs-export
          mountPath: /exports
  volumeClaimTemplates:
  - metadata:
      name: nfs-export
      annotations:
        volume.beta.kubernetes.io/storage-class: nfs-sc
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 5Gi

$ cat nfs-server-sc.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: nfs-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: raid-1
    datastore: vsanDatastore

$ cat nfs-server-svc.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: nfs-server
  name: nfs-server
  namespace: nfs
spec:
  clusterIP:
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
  selector:
    app: nfs-server

The first question is why are we using a ‘StatefulSet’. This means that if the Pod fails, it will get restarted. However, if the Pod does restart, then it will get a new IP address. This is why we need a service – a service will give us that single point of reference to communicate to the NFS server from any clients that wish to mount the share.

How this works is that the NFS server Pod will have a 5GB PV mounted. This PV is a dynamically provisioned VMDK on the vSAN datastore. This will be mounted on /exports, which is the same folder that is going to be shared by the NFS server (automatically configured to do so). Access to this volume will be via the IP address defined in the service. Since this is using a blank ClusterIP, Pods will be able to communicate to each other using the service IP rather than communicating directly to the NFS server Pod IP, but there is no EXTERNAL network access (this reminds me, I should probably do a simple 101 ‘service’ post at some point as well). Let’s now go ahead and roll out our server, whilst monitoring PVs, PVCs and Pods as we do so. Then we will login to the NFS server Pod and verify that the share is indeed being exported.

$ ls
nfs-server-sc.yaml  nfs-server-sts.yaml  nfs-server-svc.yaml

$ kubectl create -f nfs-server-sc.yaml
storageclass.storage.k8s.io/nfs-sc created

$ kubectl get sc
NAME     PROVISIONER                    AGE
nfs-sc   kubernetes.io/vsphere-volume   3s

$ kubectl create -f nfs-server-svc.yaml
service/nfs-server created

$ kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
nfs-server   ClusterIP   10.100.200.119   <none>        2049/TCP,20048/TCP,111/TCP   4s

$ kubectl create -f nfs-server-sts.yaml
statefulset.apps/nfs-server created

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                         STORAGECLASS   REASON   AGE
pvc-c207b810-8e81-11e9-b070-005056a2a261   5Gi        RWO            Delete           Bound    nfs/nfs-export-nfs-server-0   nfs-sc                  14m

$ kubectl get pvc
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-export-nfs-server-0   Bound    pvc-c207b810-8e81-11e9-b070-005056a2a261   5Gi        RWO            nfs-sc         14m

$ kubectl get pod
NAME           READY   STATUS    RESTARTS   AGE
nfs-server-0   1/1     Running   0          87s

$ kubectl exec -it nfs-server-0 -- /bin/bash
[root@nfs-server-0 /]# exportfs
/exports        <world>
/               <world>

[root@nfs-server-0 /]# mount | grep export
/dev/sdd on /exports type ext4 (rw,relatime,data=ordered)

[root@nfs-server-0 /]# df /exports
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/sdd         5029504 10236   5002884   1% /exports

[root@nfs-server-0 /]# grep nfs /etc/hosts
10.200.100.13   nfs-server-0.nfs-service.nfs.svc.cluster.local  nfs-server-0

[root@nfs-server-0 /]# exit
exit
$

NFS Client

There server side all looks good. Now that we have a service IP address for the NFS service, we can use that in our client configuration. Lets look at the client YAML manifest next, which is running a very simple busybox image. First, we have a PV, then a PVC and finally the client Pod YAML. Note that the service class is not a real service class once more; it is simply used to match the PV and the PVC. However, the PV access mode is now ReadWriteMany, implying it can be mounted by multiple Pods. In the client PV, we need to add the FQDN or IP address of the NFS server service created earlier. The client Pod should then automatically mount the /exports share from the NFS server onto its local mount point /nfs if everything works as expected.

$ kubectl get svc
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
nfs-server   ClusterIP   10.100.200.119   <none>        2049/TCP,20048/TCP,111/TCP   12m

$ cat nfs-client-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-client-pv
spec:
  storageClassName: nfs-client-sc
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: "10.100.200.119"
    path: "/exports"

$ cat nfs-client-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-client-pvc
spec:
  storageClassName: nfs-client-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

$ cat nfs-client-pod-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nfs-client-pod-1
spec:
  containers:
  - name: busybox
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/nfs"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-client-pvc

$ kubectl create -f nfs-client-pv.yaml
persistentvolume/nfs-client-pv created

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                         STORAGECLASS    REASON   AGE
nfs-client-pv                              1Gi        RWX            Retain           Available                                 nfs-client-sc            3s
pvc-c207b810-8e81-11e9-b070-005056a2a261   5Gi        RWO            Delete           Bound       nfs/nfs-export-nfs-server-0   nfs-sc                   25m

$ kubectl create -f nfs-client-pvc.yaml
persistentvolumeclaim/nfs-client-pvc created

$ kubectl get pvc
NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE
nfs-client-pvc            Bound    nfs-client-pv                              1Gi        RWX            nfs-client-sc   6s
nfs-export-nfs-server-0   Bound    pvc-c207b810-8e81-11e9-b070-005056a2a261   5Gi        RWO            nfs-sc          26m

$ kubectl create -f nfs-client-pod-1.yaml
pod/nfs-client-pod-1 created

$ kubectl get pod
NAME               READY   STATUS    RESTARTS   AGE
nfs-client-pod-1   1/1     Running   0          6s
nfs-server-0       1/1     Running   0          13m

Ok – everything looks good at this point. The NFS client Pod, which we have configured to automatically mount the NFS server /exports onto local mount point /nfs, has successfully started. Let’s login and see if that is the case.

$ kubectl exec -it nfs-client-pod-1 /bin/sh
/ # mount | grep nfs
10.100.200.119:/exports on /nfs type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,\
proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.100.200.119,mountvers=3,mountport=20048,\
mountproto=tcp,local_lock=none,addr=10.100.200.119)
/ # cd /nfs
/nfs # ls
index.html  lost+found
/nfs # touch file-created-from-client-1
/nfs # ls
file-created-from-client-1  index.html                  lost+found
/nfs #

Since this volume is a ReadWriteMany share, allowing access to multiple clients, we should also be able to launch another NFS client, using the same PVC, and access the same share

$ cat nfs-client-pod-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nfs-client-pod-2
spec:
  containers:
  - name: busybox
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/nfs"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: nfs-client-pvc

$ kubectl create -f nfs-client-pod-2.yaml
pod/nfs-client-pod-2 created

$ kubectl get pods
NAME               READY   STATUS    RESTARTS   AGE
nfs-client-pod-1   1/1     Running   0          17m
nfs-client-pod-2   1/1     Running   0          23s
nfs-server-0       1/1     Running   0          31m

$ kubectl exec -it nfs-client-pod-2 /bin/sh
/ # mount | grep nfs
10.100.200.119:/exports on /nfs type nfs (rw,relatime,vers=3,rsize=524288,wsize=524288,namlen=255,hard,\
proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.100.200.119,mountvers=3,mountport=20048,\
mountproto=tcp,local_lock=none,addr=10.100.200.119)
/ # df -h /nfs
Filesystem                Size      Used Available Use% Mounted on
10.100.200.119:/exports
                          4.8G     10.0M      4.8G   0% /nfs
/ # cd /nfs
/nfs # ls
file-created-from-client-1  index.html                  lost+found
/nfs # touch file-created-from-client-2
/nfs # ls
file-created-from-client-1  file-created-from-client-2  index.html                  lost+found
/nfs #

This looks like it is working successfully.

External NFS Access

Update: When I first this this test with flannel, I was under the mistaken impression that I needed to create a Service to allow the client Pods communicate with the external NFS server. After further testing with client Pods deployed on PKS using NSX-T, it seems that there is no need for an external Load Balancer server. The NFS Client Pods can route to the NFS server with out a service. Thanks to reddit user dmnt3d for highlighting this.

Therfore, to access an external NFS share from your Pods, it is simply a matter of creating new client PVC/PV YAML files which have the new export information, such as the following. My NAS filer is exporting file shares via the IP address 10.27.51.71 and the name of the export is /share1.

$ cat vsan-nfs-client-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: vsan-nfs-client-pvc
spec:
  storageClassName: vsan-nfs-client-sc
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: "10.27.51.71"
    path: "/share1"

$ cat vsan-nfs-client-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: vsan-nfs-client-pvc
spec:
  storageClassName: vsan-nfs-client-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

$ cat vsan-nfs-client-pod-3.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nfs-client-pod-3
spec:
  containers:
  - name: busybox
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: nfs-vol
      mountPath: "/nfs"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: nfs-vol
      persistentVolumeClaim:
        claimName: vsan-nfs-client-pvc

Once the PV/PVC and Pod have been successfully created, we can login to the Pod and check if the export has indeed been mounted on the client.

$ kubectl exec -it nfs-client-pod-3 /bin/sh
/ # mount | grep nfs
10.27.51.71:/share1 on /nfs type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,\
retrans=2,sec=sys,clientaddr=10.27.51.76,local_lock=none,addr=10.27.51.71)
/ # df -h /nfs
Filesystem                Size      Used Available Use% Mounted on
10.27.51.71:/share1       9.5T    349.0M      9.5T   0% /nfs
/ # cd /nfs
/nfs # ls
VMware-VMvisor-Installer-6.8.9-13958501.x86_64.iso
/nfs #

That appears to be working well. Now one interesting issue that I did come across was the following. I did my initial testing on PKS, the Pivotal Container Service. This worked seamlessly. However, when I deployed my NFS manifests on a new distribution of K8s 1.14.3 on Ubuntu 18.04, I got the following errors on trying to mount NFS shares on the client:

Events:
  Type     Reason       Age   From                  Message
  ----     ------       ----  ----                  -------
  Normal   Scheduled    12s   default-scheduler     Successfully assigned default/nfs-client-pod-1 to cor-k8s-w01
  Warning  FailedMount  12s   kubelet, cor-k8s-w01  MountVolume.SetUp failed for volume "nfs-client-pvc" : mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc\
 --scope -- mount -t nfs 10.111.227.6:/exports /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc
Output: Running scope as unit: run-r51ccde917c344fdf9478ab35f0be14b2.scope
mount: /var/lib/kubelet/pods/3a338d7f-8dce-11e9-920c-005056b66b16/volumes/kubernetes.io~nfs/nfs-client-pvc: bad option; for several filesystems (e.g. nfs, cifs)\
 you might need a /sbin/mount.<type> helper program.
  Warning  FailedMount  12s  kubelet, cor-k8s-w01  MountVolume.SetUp failed for volume "nfs-client-pvc" : mount failed: exit status 32

It turned out that my newly deployed Ubuntu node VMs did not have the nfs-common package installed, whereas the Ubuntu 16.04 node VMs on PKS did. After installing the package on the nodes (using apt-get install nfs-common), further NFS mounts then worked seamlessly.

Manifests used in this demo can be found on my vsphere-storage-101 github repo.