Setting up Velero 1.0.0 to backup K8s on vSphere/PKS

I have written about Velero a few times on this blog, but I haven’t actually looked at how you would deploy the 1.0.0 version, even though it has been available since May 2019. Someone recently reached out to me for some guidance on how to deploy it, as there are a few subtle differences between previous versions. Therefore I decided to document step-by-step how to do it, but focusing on when your Kubernetes cluster is running on vSphere. I also highlight a gotcha when using Velero to backup applications that are running on Kubernetes deployed via Enterprise PKS, Pivotal Container Service.

To recap, these are the steps that I will cover in detail:

  1. Download and extract Velero 1.0.0
  2. Download any required images to local repo if K8s nodes cannot access internet
  3. Deploy and Configure local Minio S3 Object Store
  4. Ensure that the PKS tile in Pivotal Ops Manager has the ‘allow privileged containers’ checkbox selected
  5. Install Velero via velero install – command should include restic support and Minio publicUrl
  6. Modify hostPath setting in restic DaemonSet for Enterprise PKS
  7. [New] Create a ConfigMap for the velero-restic-restore-helper
  8. Run a test Velero backup/restore

Let’s look at each of steps now.

1. Download and extract Velero 1.0.0

The image can be found here – https://github.com/heptio/velero/releases/tag/v1.0.0. Download and extract it, then copy or move the velero binary to somewhere in your PATH.

2. Pull any required images and push them to local repo (e.g. Harbor)

As mentioned in the introduction, this step is only necessary if your Kubernetes nodes do not have access internet. This is the case in my lab, so I do a docker pull, docker tag, docker push to my Harbor repo. For Velero, there are 3 images that need to be handled. There are 2 Minio images, which also requires a modification to the 00-minio-deployment manifest. Below are the before and after of the manifest file.

$ grep image examples/minio/00-minio-deployment.yaml
        image: minio/minio:latest
        imagePullPolicy: IfNotPresent
        image: minio/mc:latest
        imagePullPolicy: IfNotPresent

$ grep image examples/minio/00-minio-deployment.yaml
        image: harbor.rainpole.com/library/minio:latest
        imagePullPolicy: IfNotPresent
        image: harbor.rainpole.com/library/mc:latest
        imagePullPolicy: IfNotPresent

The third image is referenced during the install. By default, the image used for the Velero and restic server pods comes from “gcr.io/heptio-images/velero:v1.0.0”. We would also need to pull this image and push it to harbor, and then add a –image argument to the velero install to point to the image in my local Harbor repo, which you will see shortly.

3. Deploy and Configure local Minio Object Store

There are a few different steps required here. We have already modified the deployment YAML previously, but only if our images are in a local repo and the K8s nodes have no access to the internet. If they do, then no modification is needed.

3.1 Create a Minio credentials file

A simple credentials file containing the login/password (id/key) for the local on-premises Minio S3 Object Store must be created.

$ cat credentials-velero
[default]
aws_access_key_id = minio
aws_secret_access_key = minio123

3.2 Expose Minio Service on a NodePort

This step is a good idea for 2 reasons. The first is that it gives you a way to access the Minio portal and examine the contents of any backups. The second is that it enables you to specify a publicUrl for Minio, which in turn means that you can access backup and restore logs from the Minio S3 Object Store. To do this, it  requires a modification to the 00-minio-deployment manifest:

spec:
# ClusterIP is recommended for production environments.
# Change to NodePort if needed per documentation,
# but only if you run Minio in a test/trial environment, for example with Minikube.
type: NodePort

3.3 Deploy Minio

 $ kubectl apply -f examples/minio/00-minio-deployment.yaml

3.4 Verify Minio is available on the public URL

If we now go ahead and retrieve the node on which the Minio server is running, as well as the port that it has been exposed on with the changes made in step 3.2, we should be able to verify that Minio is working.

$ kubectl get pods -n velero
NAME                     READY   STATUS      RESTARTS   AGE
minio-66dc75bb8d-95xpp   1/1     Running     0          25s
minio-setup-zpnfl        0/1     Completed   0          25s

$ kubectl describe pod minio-66dc75bb8d-bczf8 -n velero | grep -i Node:
Node:               140ab5aa-0159-4612-b68c-df39dbea2245/192.168.192.5

$ kubectl get svc -n velero
NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
minio   NodePort   10.100.200.82   <none>        9000:32109/TCP   5s
Now if we point a browser to the Node:Port combination, hopefully Minio is available. You can also login using the credentials that we places in the credentials file in step 3.1.

OK. Everything is now in place to allow us to do our velero install.

4. Enable Privileged Containers

To successfully create the restic pods when deploying Velero, you need to enable the checkbox for ‘Allow Privileged’ (which was previously called ‘Enable Privileged Containers‘ in earlier versions of PKS). The checkbox for ‘DenyEscalatingExec’ should also be selected on the PKS plan in Pivotal Ops Manager. You will then need to re-apply the PKS configuration after selecting the checkboxes. For further details on how this setting appeared in previous versions of PKS, and the behaviour when it was not enabled, have a look at part 3 of this earlier blog on installing Velero v0.11 on PKS. It should look something like this on the PKS plan in Pivotal Ops Manager in the current versions of PKS.

5. Install Velero

A big difference in Velero 1.0 is that there is a new velero install command. No more messing around with multiple manifest files that we had in previous versions. Now there are a few things to include in the velero install command. Since there is no vSphere plugin at this time, we rely on a third party plugin called restic. The command line must include and option to use restic. As we also mentioned, we have setup a publicUrl for Minio, so we should also include this in our command line. Finally, because my K8s nodes do not have access to the internet, and thus cannot pull down external images, I have a local Harbor repo where I have already pushed the velero image. You can pull the velero image from gcr.io/heptio-images/velero:v1.0.0. I also need to reference this in the install command. With all those modifications, this is what my install command looks like:

$ velero install  --provider aws --bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--image harbor.rainpole.com/library/velero:v1.0.0 \
--use-restic \
--backup-location-config \
region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000,publicUrl=http://192.168.192.5:32109

After running the command, the following output is displayed:

CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: already exists, proceeding
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

LGTM. I also like the little sailboat in the output (Velero is Spanish for sailboat I believe). Let’s take a look at the logs and make sure everything deployed successfully.

time="2019-08-07T15:02:46Z" level=info msg="setting log-level to INFO"
time="2019-08-07T15:02:46Z" level=info msg="Starting Velero server v1.0.0 (72f5cadc3a865019ab9dc043d4952c9bfd5f2ecb)" logSource="pkg/cmd/server/server.go:165"
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pv
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=BackupItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/serviceaccount
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/azure
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=VolumeSnapshotter logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/gcp
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/aws
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/azure
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=ObjectStore logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/gcp
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/addPVCFromPod
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/addPVFromPVC
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/job
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/pod
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/restic
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/service
time="2019-08-07T15:02:46Z" level=info msg="registering plugin" command=/velero kind=RestoreItemAction logSource="pkg/plugin/clientmgmt/registry.go:100" name=velero.io/serviceaccount
time="2019-08-07T15:02:46Z" level=info msg="Checking existence of namespace" logSource="pkg/cmd/server/server.go:355" namespace=velero
time="2019-08-07T15:02:46Z" level=info msg="Namespace exists" logSource="pkg/cmd/server/server.go:361" namespace=velero
time="2019-08-07T15:02:48Z" level=info msg="Checking existence of Velero custom resource definitions" logSource="pkg/cmd/server/server.go:390"
time="2019-08-07T15:02:48Z" level=info msg="All Velero custom resource definitions exist" logSource="pkg/cmd/server/server.go:424"
time="2019-08-07T15:02:48Z" level=info msg="Checking that all backup storage locations are valid" logSource="pkg/cmd/server/server.go:431"
time="2019-08-07T15:02:48Z" level=info msg="Starting controllers" logSource="pkg/cmd/server/server.go:535"
time="2019-08-07T15:02:48Z" level=info msg="Starting metric server at address [:8085]" logSource="pkg/cmd/server/server.go:543"
time="2019-08-07T15:02:48Z" level=info msg="Server started successfully" logSource="pkg/cmd/server/server.go:788"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=gc-controller logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=gc-controller logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=backup-deletion logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=backup-deletion logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=downloadrequest logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=downloadrequest logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=serverstatusrequest logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=serverstatusrequest logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=backup-sync logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=backup-sync logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=schedule logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=schedule logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=restore logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=restore logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=restic-repository logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=restic-repository logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Starting controller" controller=backup logSource="pkg/controller/generic_controller.go:76"
time="2019-08-07T15:02:48Z" level=info msg="Waiting for caches to sync" controller=backup logSource="pkg/controller/generic_controller.go:79"
time="2019-08-07T15:02:48Z" level=info msg="Caches are synced" controller=schedule logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=serverstatusrequest logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=gc-controller logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=downloadrequest logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=backup-sync logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=backup logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=restore logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=restic-repository logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Syncing contents of backup store into cluster" backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:170"
time="2019-08-07T15:02:49Z" level=info msg="Got backups from backup store" backupCount=0 backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:178"
time="2019-08-07T15:02:49Z" level=info msg="Caches are synced" controller=backup-deletion logSource="pkg/controller/generic_controller.go:83"
time="2019-08-07T15:02:49Z" level=info msg="Checking for expired DeleteBackupRequests" controller=backup-deletion logSource="pkg/controller/backup_deletion_controller.go:441"
time="2019-08-07T15:02:49Z" level=info msg="Done checking for expired DeleteBackupRequests" controller=backup-deletion logSource="pkg/controller/backup_deletion_controller.go:469"
time="2019-08-07T15:03:49Z" level=info msg="Syncing contents of backup store into cluster" backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:170"
time="2019-08-07T15:03:49Z" level=info msg="Got backups from backup store" backupCount=0 backupLocation=default controller=backup-sync logSource="pkg/controller/backup_sync_controller.go:178"

Again, this LGTM. There are no errors in the logs. Looks like we are almost ready to take a backup.

6. Modify hostPath in restic DaemonSet for Enterprise PKS

This step is only necessary for Enterprise PKS deployments, the Pivotal Container Service. This is because the path to the Pods on the Nodes in a PKS deployment is different to what we have in native Kubernetes deployments. If you’ve deployed this on PKS, and you query the status of the Pods in the Velero namespace, you will will notice that the restic Pod have a RunContainerError/CrashLoopBackOff error. Typically the path to Pods on native K8s is /var/lib/kubelet/pods, but on PKS, they are located in /var/vcap/data/kubelet/pods. So this step is to point restic to the correct location of Pods for backup purposes, when K8s is deployed by PKS. First, identify the restic DaemonSet.

$ kubectl get ds --all-namespaces
NAMESPACE     NAME             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   vrops-cadvisor   3         3         3       3            3           <none>          5d3h
pks-system    fluent-bit       3         3         3       3            3           <none>          5d3h
pks-system    telegraf         3         3         3       3            3           <none>          5d3h
velero        restic           3         3         0       3            0           <none>          2m21s

Next, edit the DaemonSet and change the hostPath. The before and after edits are shown below.

$ kubectl edit ds restic -n velero

      volumes:
      - hostPath:
          path: /var/lib/kubelet/pods
          type: ""
        name: host-pods

      volumes:
      - hostPath:
          path: /var/vcap/data/kubelet/pods
          type: ""
        name: host-pods

daemonset.extensions/restic edited

This will terminate and restart the restic Pods. At this point, the velero and restic Pods should all be running. Now we are ready to do a test backup/restore.

7. Create a ConfigMap for the velero-restic-restore-helper

This was a step that I missed on the first version of this post. During a restore of Pods with Persistent Volumes that have been backed up with restic, a temporary pod is instantiated to assist with the restore. This image is pulled from “gcr.io/heptio-images/velero-restic-restore-helper:v1.0.0” by default. Since my nodes dod not have access to the internet, I need to tell velero to get this image from my local repo. This is achieved by creating a ConfigMap with the image location, as per the Customize Restore Helper Image instructions found here. After the usual docker pull/tag/push to get the image into my local Harbor repo, I created and applied the following map with the image location at the end:

$ cat restic-config-map.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  # any name can be used; Velero uses the labels (below)
  # to identify it rather than the name
  name: restic-restore-action-config
  # must be in the velero namespace
  namespace: velero
  # the below labels should be used verbatim in your
  # ConfigMap.
  labels:
    # this value-less label identifies the ConfigMap as
    # config for a plugin (i.e. the built-in restic restore
    # item action plugin)
    velero.io/plugin-config: ""
    # this label identifies the name and kind of plugin
    # that this ConfigMap is for.
    velero.io/restic: RestoreItemAction
data:
  # "image" is the only configurable key. The value can either
  # include a tag or not; if the tag is *not* included, the
  # tag from the main Velero image will automatically be used.
  image: harbor.rainpole.com/library/velero-restic-restore-helper:v1.0.0

$ kubectl apply -f restic-config-map.yaml
configmap/restic-restore-action-config created

This means for any restore that involved restic volumes, the helper can now be successfully pulled. you can now go ahead and check the velero versions of the client and server using the velero version command.

8. Run a test Velero backup/restore

Velero provide a sample nginx application for backup testing. However this once again relies on pulling an nginx image from the internet. If, like me, you are using a local repo, then you will have to do another pull, tag, push and update the sample manifest for the nginx app to get its image from the local repo, e.g.

$ grep image examples/nginx-app/base.yaml
- image: harbor.rainpole.com/library/nginx:1.15-alpine

Again, this is only necessary if your nodes to do not have internet access. With that modification in place, you can go ahead and deploy the sample nginx app so we can try to backup and restore it with Velero.

8.1 Deploy sample nginx app

$ kubectl apply -f examples/nginx-app/base.yaml
namespace/nginx-example created
deployment.apps/nginx-deployment created
service/my-nginx created

$ kubectl get pods --all-namespaces
NAMESPACE             NAME                                     READY   STATUS      RESTARTS   AGE
cassandra             cassandra-0                              1/1     Running     0          23h
cassandra             cassandra-1                              1/1     Running     0          23h
cassandra             cassandra-2                              1/1     Running     0          23h
default               wavefront-proxy-79568456c6-z82rh         1/1     Running     0          24h
kube-system           coredns-54586579f6-f7knj                 1/1     Running     0          5d3h
kube-system           coredns-54586579f6-t5r5h                 1/1     Running     0          5d3h
kube-system           coredns-54586579f6-v2cjt                 1/1     Running     0          5d3h
kube-system           kube-state-metrics-86977fd78d-6tb5m      2/2     Running     0          24h
kube-system           kubernetes-dashboard-6c68548bc9-km8dd    1/1     Running     0          5d3h
kube-system           metrics-server-5475446b7f-m2fgx          1/1     Running     0          5d3h
kube-system           vrops-cadvisor-488p8                     1/1     Running     0          5d3h
kube-system           vrops-cadvisor-cdx5w                     1/1     Running     0          5d3h
kube-system           vrops-cadvisor-wgkkl                     1/1     Running     0          5d3h
nginx-example         nginx-deployment-5f8798768c-5jdkn        1/1     Running     0          8s
nginx-example         nginx-deployment-5f8798768c-lrsw6        1/1     Running     0          8s
pks-system            cert-generator-v0.19.4-qh6kg             0/1     Completed   0          5d3h
pks-system            event-controller-5dbd8f48cc-vwpc4        2/2     Running     546        5d3h
pks-system            fluent-bit-7cx69                         3/3     Running     0          5d3h
pks-system            fluent-bit-fpbl6                         3/3     Running     0          5d3h
pks-system            fluent-bit-j674j                         3/3     Running     0          5d3h
pks-system            metric-controller-5bf6cb67c6-bbh6q       1/1     Running     0          5d3h
pks-system            observability-manager-5578bbb84f-w87bj   1/1     Running     0          5d3h
pks-system            sink-controller-54947f5bd9-42spw         1/1     Running     0          5d3h
pks-system            telegraf-4gv8b                           1/1     Running     0          5d3h
pks-system            telegraf-dtcjc                           1/1     Running     0          5d3h
pks-system            telegraf-m2pjd                           1/1     Running     0          5d3h
pks-system            telemetry-agent-776d45f8d8-c2xhg         1/1     Running     0          5d3h
pks-system            validator-76fff49f5d-m5t4h               1/1     Running     0          5d3h
velero                minio-66dc75bb8d-95xpp                   1/1     Running     0          11m
velero                minio-setup-zpnfl                        0/1     Completed   0          11m
velero                restic-7mztz                             1/1     Running     0          3m28s
velero                restic-cxfpt                             1/1     Running     0          3m28s
velero                restic-qx98s                             1/1     Running     0          3m28s
velero                velero-7d97d7ff65-drl5c                  1/1     Running     0          9m35s
wavefront-collector   wavefront-collector-76f7c9fb86-d9pw8     1/1     Running     0          24h


$ kubectl get ns
NAME                  STATUS   AGE
cassandra             Active   23h
default               Active   5d3h
kube-public           Active   5d3h
kube-system           Active   5d3h
nginx-example         Active   4s
pks-system            Active   5d3h
velero                Active   9m40s
wavefront-collector   Active   24h


$ kubectl get deployments --namespace=nginx-example
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
nginx-deployment   2/2     2            2           20s


$ kubectl get svc --namespace=nginx-example
NAME       TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)        AGE
my-nginx   LoadBalancer   10.100.200.147   100.64.0.1,192.168.191.70   80:30942/TCP   32s

This nginx deployment assumes the presence of a LoadBalancer for its Service. Fortunately I do have NSX-T deployed, which provides IP addresses for LoadBalancer services. In the output above, the external IP allocated for the nginx service is 192.168.191.70. If I point a browser to that IP address, I get an nginx landing page.

8.2 First backup

$ velero backup create nginx-backup --selector app=nginx
Backup request "nginx-backup" submitted successfully.
Run `velero backup describe nginx-backup` or `velero backup logs nginx-backup` for more details.

$ velero backup get
NAME           STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   2019-08-07 16:13:44 +0100 IST   29d       default            app=nginx

This backup should also be visible in Minio:

8.3 Destroy nginx deployment

Let’s now go ahead and remove the nginx namespace, and then do a restore of our backup. Hopefully our web server will come back afterwards.

$ kubectl get ns
NAME                  STATUS   AGE
cassandra             Active   40h
default               Active   5d20h
kube-public           Active   5d20h
kube-system           Active   5d20h
nginx-example         Active   17h
pks-system            Active   5d20h
velero                Active   17h
wavefront-collector   Active   41h

$ kubectl delete ns nginx-example
namespace "nginx-example" deleted

$ kubectl get ns
NAME                  STATUS   AGE
cassandra             Active   40h
default               Active   5d20h
kube-public           Active   5d20h
kube-system           Active   5d20h
pks-system            Active   5d20h
velero                Active   17h
wavefront-collector   Active   41h

$ kubectl get svc --all-namespaces
NAMESPACE     NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
cassandra     cassandra              ClusterIP   None             <none>        9042/TCP            40h
default       kubernetes             ClusterIP   10.100.200.1     <none>        443/TCP             5d20h
default       wavefront-proxy        ClusterIP   10.100.200.56    <none>        2878/TCP            46h
kube-system   kube-dns               ClusterIP   10.100.200.2     <none>        53/UDP,53/TCP       5d20h
kube-system   kube-state-metrics     ClusterIP   10.100.200.187   <none>        8080/TCP,8081/TCP   41h
kube-system   kubernetes-dashboard   NodePort    10.100.200.160   <none>        443:32485/TCP       5d20h
kube-system   metrics-server         ClusterIP   10.100.200.52    <none>        443/TCP             5d20h
pks-system    fluent-bit             ClusterIP   10.100.200.175   <none>        24224/TCP           5d20h
pks-system    validator              ClusterIP   10.100.200.149   <none>        443/TCP             5d20h
velero        minio                  NodePort    10.100.200.82    <none>        9000:32109/TCP      17h

8.4 First restore

Let’s try to restore our backup.

$ velero backup get
NAME           STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
nginx-backup   Completed   2019-08-07 16:13:44 +0100 IST   29d       default            app=nginx

$ velero restore create nginx-restore --from-backup nginx-backup
Restore request "nginx-restore" submitted successfully.
Run `velero restore describe nginx-restore` or `velero restore logs nginx-restore` for more details.

$ velero restore describe nginx-restore
Name:         nginx-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  nginx-backup

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

8.5 Verify restore succeeded

Now we need to see if the namespace, DaemonSet and service has been restored.

$ kubectl get ns
NAME                  STATUS   AGE
cassandra             Active   40h
default               Active   5d20h
kube-public           Active   5d20h
kube-system           Active   5d20h
nginx-example         Active   17s
pks-system            Active   5d20h
velero                Active   17h
wavefront-collector   Active   41h

$ kubectl get svc --all-namespaces
NAMESPACE       NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP                 PORT(S)             AGE
cassandra       cassandra              ClusterIP      None             <none>                      9042/TCP            40h
default         kubernetes             ClusterIP      10.100.200.1     <none>                      443/TCP             5d20h
default         wavefront-proxy        ClusterIP      10.100.200.56    <none>                      2878/TCP            46h
kube-system     kube-dns               ClusterIP      10.100.200.2     <none>                      53/UDP,53/TCP       5d20h
kube-system     kube-state-metrics     ClusterIP      10.100.200.187   <none>                      8080/TCP,8081/TCP   41h
kube-system     kubernetes-dashboard   NodePort       10.100.200.160   <none>                      443:32485/TCP       5d20h
kube-system     metrics-server         ClusterIP      10.100.200.52    <none>                      443/TCP             5d20h
nginx-example   my-nginx               LoadBalancer   10.100.200.225   100.64.0.1,192.168.191.67   80:32350/TCP        23s
pks-system      fluent-bit             ClusterIP      10.100.200.175   <none>                      24224/TCP           5d20h
pks-system      validator              ClusterIP      10.100.200.149   <none>                      443/TCP             5d20h
velero          minio                  NodePort       10.100.200.82    <none>                      9000:32109/TCP      17h

Note that the nginx service has been restored but it has been assigned a new IP address by the LoadBalancer. This is normal. Now let’s see if we can successfully reach our nginx web service on that IP address. Yes I can! Looks like the restore was successful.

Cool. Backups and Restore are now working on Kubernetes deployed on vSphere+Enterprise PKS using Velero 1.0. If you want to see the steps involved in backing up persistent volumes as well, check back on some of my earlier Velero posts. Also check out the official Velero 1.0 docs. You may also be interested in listening to a recent podcast we had on Velero.