Velero and Portworx – Container Volume Backup and Restores
If you’ve been following my posts for the last week or so, you’ll have noticed my write-ups on Velero backups and restores using the new release candidate (RC). I also did a recent write-up on Portworx volumes and snapshots. In this post, I’ll bring them both together, and show you how Velero and Portworx are integrated to allow backups and restores of container applications using Portworx volumes. However, first, let’s take a step back. As was highlighted to me recently, all of this is very new to a lot of people, so let’s spend a little time setting the context.
Near the end of last year, VMware acquired a company called Heptio. These guys are some of the leading lights in the Kubernetes community, and bring a wealth of expertise around Kubernetes and Cloud Native Applications to VMware. One of the open source products that was part of their portfolio was a Kubernetes backup/restore/mobility product called Ark. After the acquisition, the product was rebranded to Velero (Ark was already used). So in a nutshell, Velero allows you to take backups and do restores (and also migrations) of applications running in containers on top of Kubernetes. So why am I looking at it? Well, as part of VMware’s Storage and Availability BU, one of the things we are closely looking at is how do we make vSphere/vSAN the best platform for running cloud native applications (including K8s). This also involves how we implement day 2 type operations for these (newer) applications. Backup and Restore obviously fit fairly and squarely into this segment.
And finally, just by way of closing off this brief introduction, Portworx have been a significant player in the cloud native storage space for a while now. They have already worked with Velero (Ark) in the past, and have a plugin for taking snapshots to enable Velero to do backups and restores of container volumes that are deployed on Portworx backed container volumes. Portworx has also kindly provided early access to an RC version of their plugin to work with the RC version of Velero.
OK then – let’s take a look at how these two products work together.
To begin with, we had tried to do this before with v0.11 of Velero but due to a known issue with additional spaces in the snapshot name, we could never get this working. With the release candidate announcement for Velero, we reached out to Portworx to see if we could get early access to their new plugin. They kindly agreed, and I was finally able to do some test backups and restores of Cassandra using Portworx volumes. Here are the version numbers that I am using for this test.
$ velero version Client: Version: v1.0.0-rc.1 Git commit: d05f8e53d8ecbdb939d5d3a3d24da7868619ec3d Server: Version: v1.0.0-rc.1 $ /opt/pwx/bin/pxctl -v pxctl version 2.0.3.4-0c0bbe4
I’m not going to go through the details of deploying Velero here. Suffice to say that there is a new velero install command in the RC release that should make things easier than before. You can still set it up using the older YAML file method by copying files from the previous v0.11 to the RC distro as per my earlier blog post.
In this write-up, since the assumption is that Portworx has already been deployed, I will also use Portworx for the volume to back my Minio S3 object store. The Portworx team have some great write-ups on how to deploy Portworx on-premises if you need guidance. Here is an example of the StorageClass and PVC for my Minio deployment to do just that. Note that I have selected to have my Minio S3 replicated by a factor of 3 as per the parameters in the StorageClass.
---
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: minio-sc
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "3"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-pv-claim-1
namespace: velero
annotations:
volume.beta.kubernetes.io/storage-class: minio-sc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
The next step is to add the Portworx plugin to Velero. For RC, the Portworx plugin is portworx/velero-plugin:1.0.0-rc1. To add the plugin, simply run velero plugin add portworx/velero-plugin:1.0.0-rc1. This will only work with the RC version of Velero. As there is currently no velero plugin list command, the only way to check if the plugin was successfully added is to describe the Velero POD, and examine the Init Containers (see below). This should list all of the plugins that have been added to Velero. I have filed a GitHub feature request to get a velero plugin list command.
Also not that I have aliased my kubectl command to simply ‘k’, as you can see below.
$ velero plugin add portworx/velero-plugin:1.0.0-rc1 $ k get pod -n velero NAME READY STATUS RESTARTS AGE minio-74995c888c-b9d2m 1/1 Running 0 25h minio-setup-l5sfl 0/1 Completed 0 25h velero-c7c95547b-wd867 1/1 Running 0 25h $ k describe pod velero-c7c95547b-wd867 -n velero Name: velero-c7c95547b-wd867 Namespace: velero Priority: 0 PriorityClassName: <none> Node: k8s-2/10.27.51.66 Start Time: Mon, 13 May 2019 09:55:14 +0100 Labels: component=velero pod-template-hash=c7c95547b Annotations: prometheus.io/path: /metrics prometheus.io/port: 8085 prometheus.io/scrape: true Status: Running IP: 10.244.2.49 Controlled By: ReplicaSet/velero-c7c95547b Init Containers: velero-plugin: Container ID: docker://0ae4789051de7db74745b3f34b289893f4641b3222a147bf82de73482573f7e7 Image: portworx/velero-plugin:1.0.0-rc1 Image ID: docker-pullable://portworx/velero-plugin@sha256:e11f24cc18396e5a4542ea71e789598f6e3149d178ec6c4f70781e9c3059a8ea Port: <none> Host Port: <none> State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 13 May 2019 09:55:19 +0100 Finished: Mon, 13 May 2019 09:55:19 +0100 Ready: True Restart Count: 0 Environment: <none> Mounts: /target from plugins (rw) /var/run/secrets/kubernetes.io/serviceaccount from velero-token-lp4gq (ro)
Now we need to tell Velero about the Snapshot Provider, and Snapshot location. This is done by creating a YAML file for the VolumeSnapshotLocation kind. One other thing to note here, which is different from the current Portworx documentation, is that the provider needs to use a fully qualified name for the plugin. This is new in Velero 1.0.0. This now makes the Portworx provider portworx.io/portworx. The full YAML file looks something like this.
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
name: portworx-local
namespace: velero
spec:
provider: portworx.io/portworx
Now we should have both the BackupStorageLocation and the VolumeSnapshotLocation ready to go. Let’s double check that.
$ k get BackupStorageLocation -n velero NAME AGE default 25h $ k get VolumeSnapshotLocation -n velero NAME AGE portworx-local 20
At this point, everything is in place to begin our backup and restore test. Once again, I will use my trusty Cassandra instance, which has been pre-populated with some sample data. Let’s first examine the application from a K8s perspective.
$ k get sts -n cassandra NAME READY AGE cassandra 3/3 73m $ k get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 73m cassandra-1 1/1 Running 3 73m cassandra-2 1/1 Running 3 73m $ k get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-a9e89927-7589-11e9-ac93-005056b82121 1Gi RWO cass-sc 74m cassandra-data-cassandra-1 Bound pvc-5f7d3306-758b-11e9-ac93-005056b82121 1Gi RWO cass-sc 74m cassandra-data-cassandra-2 Bound pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1Gi RWO cass-sc 74m $ k get pv | grep cassandra pvc-5f7d3306-758b-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 74m pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 74m pvc-a9e89927-7589-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 74m $ k exec -it cassandra-0 -n cassandra -- nodetool status Datacenter: DC1-K8Demo ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.244.2.53 189.17 KiB 32 100.0% b7d527a6-d465-472a-82e1-8184a924045e Rack1-K8Demo UN 10.244.4.23 234.35 KiB 32 100.0% 4b678dd5-92af-4003-b978-559266e07d65 Rack1-K8Demo UN 10.244.3.28 143.58 KiB 32 100.0% 74f0a3a6-5b34-4c97-b8ea-f71589b3fbca Rack1-K8Demo $ k exec -it cassandra-0 -n cassandra -- cqlsh Connected to K8Demo at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cqlsh> use demodb; cqlsh:demodb> select * from emp; emp_id | emp_city | emp_name | emp_phone | emp_sal --------+----------+----------+-----------+--------- 100 | Cork | Cormac | 999 | 1000000 (1 rows) cqlsh:demodb> exit $
Let’s now list the current volumes and snapshots from a Portworx perspective. There is a volume for each of the 3 Cassandra replicas (1GiB) and there is an additional one for my on-prem Minio volume (10GiB). Currently, there should not be any snapshots as we have not initiated any backups.
$ /opt/pwx/bin/pxctl volume list ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1051863986075634800 pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.66 no 567192692874972784 pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.64 no 116101267951461079 pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.27 no 972553017890078199 pvc-e4710be5-755a-11e9-ac93-005056b82121 10 GiB 3 no no LOW up - attached on 10.27.51.66 no $ /opt/pwx/bin/pxctl volume list --snapshot ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED
We are now ready to take our first Velero backup. In the command below, I am only going to backup the Cassandra namespace and associated objects. I am going to exclude all other K8s namespaces. There are a number of commands available to check the status of the backup. By including a —details option to the velero backup describe command, further details about the snapshots are displayed. You could also use velero backup logs to show the call-outs to the Portworx Snapshot provider when the backup is initiating a snapshot of the Cassandra PVs.
$ velero backup create cassandra --include-namespaces cassandra Backup request "cassandra" submitted successfully. Run `velero backup describe cassandra` or `velero backup logs cassandra` for more details. $ velero backup describe cassandra Name: cassandra Namespace: velero Labels: velero.io/storage-location=default Annotations: <none> Phase: Completed Namespaces: Included: cassandra Excluded: <none> Resources: Included: * Excluded: <none> Cluster-scoped: auto Label selector: <none> Storage Location: default Snapshot PVs: auto TTL: 720h0m0s Hooks: <none> Backup Format Version: 1 Started: 2019-05-14 11:21:50 +0100 IST Completed: 2019-05-14 11:21:55 +0100 IST Expiration: 2019-06-13 11:21:50 +0100 IST Persistent Volumes: 3 of 3 snapshots completed successfully (specify --details for more information) $ velero backup describe cassandra --details Name: cassandra Namespace: velero Labels: velero.io/storage-location=default Annotations: <none> Phase: Completed Namespaces: Included: cassandra Excluded: <none> Resources: Included: * Excluded: <none> Cluster-scoped: auto Label selector: <none> Storage Location: default Snapshot PVs: auto TTL: 720h0m0s Hooks: <none> Backup Format Version: 1 Started: 2019-05-14 11:21:50 +0100 IST Completed: 2019-05-14 11:21:55 +0100 IST Expiration: 2019-06-13 11:21:50 +0100 IST Persistent Volumes: pvc-a9e89927-7589-11e9-ac93-005056b82121: Snapshot ID: 849199168767327835 Type: portworx-snapshot Availability Zone: IOPS: <N/A> pvc-5f7d3306-758b-11e9-ac93-005056b82121: Snapshot ID: 1019215085859062674 Type: portworx-snapshot Availability Zone: IOPS: <N/A> pvc-9d9be9f2-758b-11e9-ac93-005056b82121: Snapshot ID: 841348593482616373 Type: portworx-snapshot Availability Zone: IOPS: <N/A>
And to finish off the backup part of this post, let’s do another set of Portworx commands to see some information about the volumes and snapshots. At this point, we would expect to see a snapshot for each Cassandra PV. Indeed we do, and we also see that the snapshot names include the name of the cassandra application. I’m not sure at this point where this is retrieved from, possible application label.
$ /opt/pwx/bin/pxctl volume list ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 1051863986075634800 pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.66 no 567192692874972784 pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.64 no 116101267951461079 pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.27 no 972553017890078199 pvc-e4710be5-755a-11e9-ac93-005056b82121 10 GiB 3 no no LOW up - attached on 10.27.51.66 no $ /opt/pwx/bin/pxctl volume list --snapshot ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no
Let’s now go ahead and do something drastic. Let’s delete the Cassandra namespace (which will remove the StatefulSet, PODs, PVCs, PVs, etc). We will then use Velero to restore it, and hopefully observe that our Cassandra instance comes back with our sample data.
After first deleting the Cassandra namespace, we see that this also removes the persistent volumes. This can also be seen from Portworx. However the snapshots are still intact. These are in fact the only volumes that are now listed by Portworx, and note that they are not attached to K8s worker nodes.
$ k delete ns cassandra namespace "cassandra" deleted $ /opt/pwx/bin/pxctl volume list ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 972553017890078199 pvc-e4710be5-755a-11e9-ac93-005056b82121 10 GiB 3 no no LOW up - attached on 10.27.51.66 no $ /opt/pwx/bin/pxctl volume list --snapshot ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no
Now it is time to do the restore of my Cassandra application using Velero. Using the —details option to the velero restore describe command does not appear to provide any additional details. However the logs can again be used to check if you had a successful restore of PVs from Portworx snapshots using velero restore logs command.
$ velero backup get NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR cassandra Completed 2019-05-14 11:21:50 +0100 IST 29d default <none> $ velero restore create cassandra-restore --from-backup cassandra Restore request "cassandra-restore" submitted successfully. Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details. $ velero restore describe cassandra-restore Name: cassandra-restore Namespace: velero Labels: <none> Annotations: <none> Phase: Completed Backup: cassandra Namespaces: Included: * Excluded: <none> Resources: Included: * Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io Cluster-scoped: auto Namespace mappings: <none> Label selector: <none> Restore PVs: auto
Let’s take a look at this from a Portworx perspective after the restore. The PVs are restored and attached to K8s worker nodes.
$ /opt/pwx/bin/pxctl volume list ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 960922544915702517 pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.64 no 180406970719671498 pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.27 no 305662590414763324 pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - attached on 10.27.51.66 no 972553017890078199 pvc-e4710be5-755a-11e9-ac93-005056b82121 10 GiB 3 no no LOW up - attached on 10.27.51.66 no $ /opt/pwx/bin/pxctl volume list --snapshot ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED 1019215085859062674 cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 841348593482616373 cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no 849199168767327835 cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121 1 GiB 3 no no LOW up - detached no
And last but not least, let’s verify the Cassandra application is fully functional with all 3 nodes rejoined and up, and table data has been restored.
$ k get sts -n cassandra NAME READY AGE cassandra 3/3 5m22s $ k get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 5m28s cassandra-1 1/1 Running 2 5m28s cassandra-2 1/1 Running 2 5m28s $ k get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-a9e89927-7589-11e9-ac93-005056b82121 1Gi RWO cass-sc 5m41s cassandra-data-cassandra-1 Bound pvc-5f7d3306-758b-11e9-ac93-005056b82121 1Gi RWO cass-sc 5m41s cassandra-data-cassandra-2 Bound pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1Gi RWO cass-sc 5m41s $ k get svc -n cassandra NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cassandra ClusterIP None <none> 9042/TCP 5m46s $ k get pv | grep cassandra pvc-5f7d3306-758b-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 5m58s pvc-9d9be9f2-758b-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 5m56s pvc-a9e89927-7589-11e9-ac93-005056b82121 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 5m55s $ k exec -it cassandra-0 -n cassandra -- nodetool status Datacenter: DC1-K8Demo ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.244.2.54 142.61 KiB 32 100.0% 74f0a3a6-5b34-4c97-b8ea-f71589b3fbca Rack1-K8Demo UN 10.244.3.29 275.59 KiB 32 100.0% 4b678dd5-92af-4003-b978-559266e07d65 Rack1-K8Demo UN 10.244.4.24 230.94 KiB 32 100.0% b7d527a6-d465-472a-82e1-8184a924045e Rack1-K8Demo $ k exec -it cassandra-0 -n cassandra -- cqlsh Connected to K8Demo at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4] Use HELP for help. cqlsh> use demodb; cqlsh:demodb> select * from emp; emp_id | emp_city | emp_name | emp_phone | emp_sal --------+----------+----------+-----------+--------- 100 | Cork | Cormac | 999 | 1000000 (1 rows) cqlsh:demodb>
Everything looks good. Velero, with the Portworx plugin for snapshots, has been able to backup and restore a Cassandra instance.
Now, since these are both release candidates, you are not expected to use this for production purposes. However, if you have an opportunity to test these products in your own lab environments, I am sure both the Velero team and the Portworx plugin teams to love to get your feedback.
Great blog post Cormac! Very cool to see Portworx working with Velero. FYI, you can also use Portworx to natively backup your K8s objects. We call this PX-DR. Portworx will back up your volumes AND your k8s objects in a single command. We can do this with RPO zero in data centers in a metro area of near-RPO zero across the WAN. Check it out: https://2.1.docs.portworx.com/portworx-install-with-kubernetes/disaster-recovery/