Using Velero to backup and restore applications that use vSAN File Service RWX file shares

It has been a while since I looked at Velero, our backup and restore product for Kubernetes cluster resources. This morning I noticed that the Velero team just published version 1.4. This article uses the previous version of Velero, version is v1.3.2. The version should not make a difference to the article. In this post, I want to see Velero backing up and restoring applications that use read-write-many (RWX) volumes that are dynamically provisioned as file shares from vSAN 7.0 File Services. To demonstrate, I’ll create two simple busybox Pods in their own namespace. Using the vSphere CSI driver, Kubernetes will  dynamically mount the same NFS file share to both Pods from vSAN File Services. I’ll then start a simple script on both Pods which writes a timestamp to a shared file. Next I’ll backup the namespace, then delete the namespace and finally restore it from the backup. For this exercise, I will be relying on the restic plugin. I will not go through the steps of deploying out the evaluation Minio S3 store that comes with Velero, as I’ve done that in many time before in previous posts.

OK – let’s begin.

Step 1 – Deploy Velero v1.3.2

As mentioned, a Minio S3 store has been provisioned. I have also set the Minio service to use NodePort.

$ kubectl get pod -n velero | grep minio
minio-d787f4bf7-hbll6              1/1     Running     0          4d20h
minio-setup-kk95w                  0/1     Completed   3          4d20h

$ kubectl get svc -n velero
NAME    TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)          AGE
minio   NodePort   10.110.84.6   <none>        9000:32762/TCP   2d22h

With the Kubernetes work node and port information, I can now proceed to deploy Velero. I have also done this in numerous earlier posts, but as a quick reminder, a deployment that uses restic will be similar to the following. Note the publicURL setting matches the Kubernetes worker node and port for the Minio service.

$ velero install  \
--provider aws \
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--plugins velero/velero-plugin-for-aws:v1.0.0 \
--use-restic \
--backup-location-config \
region=minio,\
s3ForcePathStyle="true",\
s3Url=http://minio.velero.svc:9000,\
publicUrl=http://10.27.51.49:32762

 The “secret-file” contains the login and password for Minio. Note also the option –use-restic. Full installation details for Velero can be found here. For details about how to do a Velero evaluation deploy with the example Minio, the details are here.

When the deployment completes, you should observe the following message:

Velero is installed! ⛵ 
Use 'kubectl logs deployment/velero -n velero' to view the status.

Step 2 – Set up a sample application

In this step, as mentioned in the introduction, I create a new namespace, I create a StorageClass and then create a Persistent Volume Claim (PVC) to dynamically provision a RWX file share from vSAN File Services. I  then deploy 2 x Pods which mount the same read-write-many (RWX) file share. Finally, I run a simple shell script so that both Pods write to the share simultaneously. Here are the manifests files that I am using. The first two are for the Pods, then there is the PVC, and finally there is the StorageClass.

apiVersion: v1
kind: Pod
metadata:
  name: file-pod-a2
  namespace: rwx-backup
spec:
  containers:
  - name: file-pod-a2
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: file-vol-2
      mountPath: "/mnt/volume1"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: file-vol-2
      persistentVolumeClaim:
        claimName: file-pvc


apiVersion: v1
kind: Pod
metadata:
  name: file-pod-a
  namespace: rwx-backup
spec:
  containers:
  - name: file-pod-a
    image: "k8s.gcr.io/busybox"
    volumeMounts:
    - name: file-vol
      mountPath: "/mnt/volume1"
    command: [ "sleep", "1000000" ]
  volumes:
    - name: file-vol
      persistentVolumeClaim:
        claimName: file-pvc


apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: file-pvc
  namespace: rwx-backup
spec:
  storageClassName: vsan-file-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 2Gi


apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vsan-file-sc
provisioner: csi.vsphere.vmware.com
parameters:
  storagepolicyname: "RAID1"
  csi.storage.k8s.io/fstype: nfs4

Let’s create the application.

First step is to create the namespace. This is referenced by both the Pods and the PVC manifests.

$ kubectl get ns
NAME              STATUS   AGE
cassandra         Active   16d
default           Active   26d
kube-node-lease   Active   26d
kube-public       Active   26d
kube-system       Active   26d
velero            Active   2d23h
vsan-prometheus   Active   17d

$ kubectl create ns rwx-backup
namespace/rwx-backup created

$ kubectl get ns
NAME              STATUS   AGE
cassandra         Active   16d
default           Active   26d
kube-node-lease   Active   26d
kube-public       Active   26d
kube-system       Active   26d
rwx-backup        Active   3s
velero            Active   2d23h
vsan-prometheus   Active   17d

Next we create the StorageClass.

$ kubectl get sc
NAME                    PROVISIONER              AGE
cass-sc-csi (default)   csi.vsphere.vmware.com   25d

$ kubectl apply -f fs-sc.yaml
storageclass.storage.k8s.io/vsan-file-sc created

$ kubectl get sc
NAME                    PROVISIONER              AGE
cass-sc-csi (default)   csi.vsphere.vmware.com   25d
vsan-file-sc            csi.vsphere.vmware.com   2s

Now we can create the PVC and the Pods that will share the RWX PV.

$ kubectl get pods -n rwx-backup
No resources found.

$ kubectl get pvc -n rwx-backup
No resources found.

$ kubectl apply -f fs-pvc.yaml
persistentvolumeclaim/file-pvc created

$ kubectl get pvc -n rwx-backup
NAME       STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
file-pvc   Pending                                      vsan-file-sc   6s

$ kubectl get pvc -n rwx-backup
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
file-pvc   Bound    pvc-627cbd4c-fc6a-4c73-91e8-5f864f6a8b30   2Gi        RWX            vsan-file-sc   19s

$ kubectl apply -f fs-pod-a.yaml
pod/file-pod-a created

$ kubectl apply -f fs-pod-a2.yaml
pod/file-pod-a2 created

$ kubectl get pods -n rwx-backup
NAME          READY   STATUS    RESTARTS   AGE
file-pod-a    1/1     Running   0          22s
file-pod-a2   1/1     Running   0          17s

Here is the volume that was dynamically created from vSAN 7.0 File Services. From here, I can use the “Copy URL” option to find out what IP address is used to export the file share.

Here is the file share as seen from the CNS UI in vSphere 7.0, filtered by namespace.

Let’s now ssh onto the Pods, check the share/mount and write some data to it from each Pod. You can verify the mount point by comparing the IP address to the “Copy URL” step in the vSphere client referenced earlier.

$ kubectl exec -it file-pod-a -n rwx-backup -- /bin/sh

/ # mount | grep nfs
10.27.51.214:/52a89bc3-d9db-df72-6418-a67c768eea0a on /mnt/volume1 type nfs4\
 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,\
timeo=600,retrans=2,sec=sys,clientaddr=10.244.2.5,local_lock=none,addr=10.27.51.214)
/ # cd /mnt/volume1/
/mnt/volume1 # while true
> do
> echo "POD1 - `date`" >> timestamp
> date
> sleep 5
> done
Mon May 25 10:40:12 UTC 2020
Mon May 25 10:40:17 UTC 2020
Mon May 25 10:40:22 UTC 2020


$ kubectl exec -it file-pod-a2  -n rwx-backup -- /bin/sh

/ # mount | grep nfs
10.27.51.214:/52a89bc3-d9db-df72-6418-a67c768eea0a on /mnt/volume1 type nfs4\
 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,\
timeo=600,retrans=2,sec=sys,clientaddr=10.244.1.4,local_lock=none,addr=10.27.51.214)

/ # cd /mnt/volume1/
/mnt/volume1 # ls
timestamp
/mnt/volume1 # while true
> do
> echo "POD2 - `date`" >> timestamp
> date
> sleep 5
> done
Mon May 25 10:43:23 UTC 2020
Mon May 25 10:43:28 UTC 2020
Mon May 25 10:43:33 UTC 2020

Step 3 – Take a backup

Now that we are writing some data to the shared volume, lets proceed with the backup. Because we are using restic, the volumes need to be annotated first. If they are not annotated, the restic plugin will not include them in the backup. Simply annotate the volume name referenced in the Pod as follows:

$ kubectl -n rwx-backup annotate pod file-pod-a backup.velero.io/backup-volumes=file-vol
pod/file-pod-a annotated

$ kubectl -n rwx-backup annotate pod file-pod-a2 backup.velero.io/backup-volumes=file-vol-2
pod/file-pod-a2 annotated

Now we are ready to take our first backup. We will backup everything in the rwx-backup namespace.

$ velero backup create rwx-backup --include-namespaces rwx-backup
Backup request "rwx-backup" submitted successfully.
Run `velero backup describe rwx-backup` or `velero backup logs rwx-backup` for more details.

After some time …

$ velero backup describe rwx-backup --details
Name:         rwx-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  rwx-backup
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-05-25 11:58:23 +0100 IST
Completed:  2020-05-25 11:58:39 +0100 IST

Expiration:  2020-06-24 11:58:23 +0100 IST

Resource List:
  v1/Event:
    - rwx-backup/file-pod-a.16123eb1a6e3d8f2
    - rwx-backup/file-pod-a.16123eb1bffbc348
    - rwx-backup/file-pod-a.16123eb2b638b193
    - rwx-backup/file-pod-a.16123eb2d8d5ca23
    - rwx-backup/file-pod-a.16123eb2e862d699
    - rwx-backup/file-pod-a.16123eb2f59cb019
    - rwx-backup/file-pod-a2.16123eb2cb8a5210
    - rwx-backup/file-pod-a2.16123eb2e4154401
    - rwx-backup/file-pod-a2.16123eb40a1e6093
    - rwx-backup/file-pod-a2.16123eb43136f356
    - rwx-backup/file-pod-a2.16123eb441935de5
    - rwx-backup/file-pod-a2.16123eb44ee81023
    - rwx-backup/file-pvc.16123eaa60a20a83
    - rwx-backup/file-pvc.16123eaa60d1535f
    - rwx-backup/file-pvc.16123ead3da8b139
  v1/Namespace:
    - rwx-backup
  v1/PersistentVolume:
    - pvc-627cbd4c-fc6a-4c73-91e8-5f864f6a8b30
  v1/PersistentVolumeClaim:
    - rwx-backup/file-pvc
  v1/Pod:
    - rwx-backup/file-pod-a
    - rwx-backup/file-pod-a2
  v1/Secret:
    - rwx-backup/default-token-z4nxj
  v1/ServiceAccount:
    - rwx-backup/default

Persistent Volumes: <none included>

Restic Backups:
  Completed:
    rwx-backup/file-pod-a: file-vol

You might notice the “Restic Backups” towards the end of that output and ask why is there only one volume backed up? This is because Velero knows that the Pod volume ‘file-vol-2’ uses a persistent volume claim which has already been backed up from another pod, so it skips it. Remember, it is a shared volume.

Let’s check the backup …

$ velero backup get
NAME         STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
rwx-backup   Completed   2020-05-25 11:58:23 +0100 IST   29d       default            <none>

Looks good. Now lets have something catastrophic happen and see if our restore works.

Step 4 – Destroy the application

I am now going to destroy the namespace where the application is running, and then restore it using Velero. At the same time, let’s observe what happens with the file share / persistent volume from a vSAN File Services perspective.

$ kubectl get ns
NAME              STATUS   AGE
cassandra         Active   17d
default           Active   26d
kube-node-lease   Active   26d
kube-public       Active   26d
kube-system       Active   26d
rwx-backup        Active   60m
velero            Active   3d
vsan-prometheus   Active   17d

$ kubectl delete ns rwx-backup
namespace "rwx-backup" deleted

$ kubectl get ns
NAME              STATUS   AGE
cassandra         Active   17d
default           Active   26d
kube-node-lease   Active   26d
kube-public       Active   26d
kube-system       Active   26d
velero            Active   3d
vsan-prometheus   Active   17d

We can clearly see that the namespace is gone. Let’s check on vSphere to see what has happened. The PV has been removed from CNS and the file share has also been removed from vSAN File Services.

Step 5 – Restore the application using Velero

To check that the backup has indeed worked, we will now attempt a restore of the backup.

$ velero backup get
NAME         STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
rwx-backup   Completed   2020-05-25 11:58:23 +0100 IST   29d       default            <none>

$ velero restore create timestamp-restore --from-backup rwx-backup
Restore request "timestamp-restore" submitted successfully.
Run `velero restore describe timestamp-restore` or `velero restore logs timestamp-restore` for more details.

After some time …

$ velero restore describe timestamp-restore
Name: timestamp-restore
Namespace: velero
Labels: <none>
Annotations: <none>

Phase: Completed

Backup: rwx-backup

Namespaces:
Included: *
Excluded: <none>

Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto

Namespace mappings: <none>

Label selector: <none>

Restore PVs: auto

Restic Restores (specify --details for more information):
Completed: 1

Note that there was also a “Restic Restore“. This should be our PV. Let’s see what was restored.

$ kubectl get ns
NAME              STATUS   AGE
cassandra         Active   17d
default           Active   26d
kube-node-lease   Active   26d
kube-public       Active   26d
kube-system       Active   26d
rwx-backup        Active   2m25s
velero            Active   3d
vsan-prometheus   Active   17d

$ kubectl get pvc -n rwx-backup
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
file-pvc   Bound    pvc-3c357006-9e59-49de-b06d-6fe9647fafbe   2Gi        RWX            vsan-file-sc   2m35s

$ kubectl get pods -n rwx-backup
NAME          READY   STATUS    RESTARTS   AGE
file-pod-a    1/1     Running   0          2m41s
file-pod-a2   1/1     Running   0          2m41s

This looks good from a kubectl perspective. What about back on vSphere? Looks like we have a restored vSAN File Service file share and a corresponding PV.

The only difference is that the PV now has a couple of new labels to indicate that it was restored via Velero. Otherwise it is identical. As a final step, let’s open a shell to one of the Pods and ensure that the file share contents was also restored.

$ kubectl exec -it file-pod-a2 -n rwx-backup -- /bin/sh
/ # cd /mnt/volume1/
/mnt/volume1 # ls
timestamp
/mnt/volume1 # tail timestamp
POD2 - Mon May 25 10:58:10 UTC 2020
POD1 - Mon May 25 10:58:14 UTC 2020
POD2 - Mon May 25 10:58:15 UTC 2020
POD1 - Mon May 25 10:58:19 UTC 2020
POD2 - Mon May 25 10:58:20 UTC 2020
POD1 - Mon May 25 10:58:24 UTC 2020
POD2 - Mon May 25 10:58:25 UTC 2020
POD1 - Mon May 25 10:58:29 UTC 2020
POD2 - Mon May 25 10:58:30 UTC 2020
POD1 - Mon May 25 10:58:34 UTC 2020
/mnt/volume1 #

Success! The contents from the application has been preserved and successfully restored. Thus, Velero with the restic plugin can be used to backup and restore applications leveraging RWX file shares from vSAN File Services.