Velero and Portworx – Container Volume Backup and Restores

If you’ve been following my posts for the last week or so, you’ll have noticed my write-ups on Velero backups and restores using the new release candidate (RC). I also did a recent write-up on Portworx volumes and snapshots. In this post, I’ll bring them both together, and show you how Velero and Portworx are integrated to allow backups and restores of container applications using Portworx volumes. However, first, let’s take a step back. As was highlighted to me recently, all of this is very new to a lot of people, so let’s spend a little time setting the context.

Near the end of last year, VMware acquired a company called Heptio. These guys are some of the leading lights in the Kubernetes community, and bring a wealth of expertise around Kubernetes and Cloud Native Applications to VMware. One of the open source products that was part of their portfolio was a Kubernetes backup/restore/mobility product called Ark. After the acquisition, the product was rebranded to Velero (Ark was already used). So in a nutshell, Velero allows you to take backups and do restores (and also migrations) of applications running in containers on top of Kubernetes. So why am I looking at it? Well, as part of VMware’s Storage and Availability BU, one of the things we are closely looking at is how do we make vSphere/vSAN the best platform for running cloud native applications (including K8s).  This also involves how we implement day 2 type operations for these (newer) applications. Backup and Restore obviously fit fairly and squarely into this segment.

And finally, just by way of closing off this brief introduction, Portworx have been a significant player in the cloud native storage space for a while now. They have already worked with Velero (Ark) in the past, and have a plugin for taking snapshots to enable Velero to do backups and restores of container volumes that are deployed on Portworx backed container volumes. Portworx has also kindly provided early access to an RC version of their plugin to work with the RC version of Velero.

OK then – let’s take a look at how these two products work together.

To begin with, we had tried to do this before with v0.11 of Velero but due to a known issue with additional spaces in the snapshot name, we could never get this working. With the release candidate announcement for Velero, we reached out to Portworx to see if we could get early access to their new plugin. They kindly agreed, and I was finally able to do some test backups and restores of Cassandra using Portworx volumes. Here are the version numbers that I am using for this test.

$ velero version
Client:
       Version: v1.0.0-rc.1
       Git commit: d05f8e53d8ecbdb939d5d3a3d24da7868619ec3d
Server:
        Version: v1.0.0-rc.1

$ /opt/pwx/bin/pxctl -v
pxctl version 2.0.3.4-0c0bbe4

I’m not going to go through the details of deploying Velero here. Suffice to say that there is a new velero install command in the RC release that should make things easier than before. You can still set it up using the older YAML file method by copying files from the previous v0.11 to the RC distro as per my earlier blog post.

In this write-up, since the assumption is that Portworx has already been deployed, I will also use Portworx for the volume to back my Minio S3 object store. The Portworx team have some great write-ups on how to deploy Portworx on-premises if you need guidance. Here is an example of the StorageClass and PVC for my Minio deployment to do just that. Note that I have selected to have my Minio S3 replicated by a factor of 3 as per the parameters in the StorageClass.

---
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: minio-sc
provisioner: kubernetes.io/portworx-volume
parameters:
   repl: "3"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: minio-pv-claim-1
  namespace: velero
  annotations:
    volume.beta.kubernetes.io/storage-class: minio-sc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---

The next step is to add the Portworx plugin to Velero. For RC, the Portworx plugin is portworx/velero-plugin:1.0.0-rc1. To add the plugin, simply run velero plugin add portworx/velero-plugin:1.0.0-rc1. This will only work with the RC version of Velero. As there is currently no velero plugin list command, the only way to check if the plugin was successfully added is to describe the Velero POD, and examine the Init Containers (see below). This should list all of the plugins that have been added to Velero. I have filed a GitHub feature request to get a velero plugin list command.

Also not that I have aliased my kubectl command to simply ‘k’, as you can see below.

$ velero plugin add portworx/velero-plugin:1.0.0-rc1

$ k get pod -n velero
NAME                     READY   STATUS      RESTARTS   AGE
minio-74995c888c-b9d2m   1/1     Running     0          25h
minio-setup-l5sfl        0/1     Completed   0          25h
velero-c7c95547b-wd867   1/1     Running     0          25h

$ k describe pod velero-c7c95547b-wd867 -n velero
Name:               velero-c7c95547b-wd867
Namespace:          velero
Priority:           0
PriorityClassName:  <none>
Node:               k8s-2/10.27.51.66
Start Time:         Mon, 13 May 2019 09:55:14 +0100
Labels:             component=velero
                    pod-template-hash=c7c95547b
Annotations:        prometheus.io/path: /metrics
                    prometheus.io/port: 8085
                    prometheus.io/scrape: true
Status:             Running
IP:                 10.244.2.49
Controlled By:      ReplicaSet/velero-c7c95547b
Init Containers:
  velero-plugin:
    Container ID:   docker://0ae4789051de7db74745b3f34b289893f4641b3222a147bf82de73482573f7e7
    Image:          portworx/velero-plugin:1.0.0-rc1
    Image ID:       docker-pullable://portworx/velero-plugin@sha256:e11f24cc18396e5a4542ea71e789598f6e3149d178ec6c4f70781e9c3059a8ea
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 13 May 2019 09:55:19 +0100
      Finished:     Mon, 13 May 2019 09:55:19 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /target from plugins (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from velero-token-lp4gq (ro)

Now we need to tell Velero about the Snapshot Provider, and Snapshot location. This is done by creating a YAML file for the VolumeSnapshotLocation kind. One other thing to note here, which is different from the current Portworx documentation, is that the provider needs to use a fully qualified name for the plugin. This is new in Velero 1.0.0. This now makes the Portworx provider portworx.io/portworx. The full YAML file looks something like this.

apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: portworx-local
  namespace: velero
spec:
  provider: portworx.io/portworx

Now we should have both the BackupStorageLocation and the VolumeSnapshotLocation ready to go. Let’s double check that.

$ k get BackupStorageLocation -n velero
NAME      AGE
default   25h

$ k get VolumeSnapshotLocation -n velero
NAME             AGE
portworx-local   20

At this point, everything is in place to begin our backup and restore test. Once again, I will use my trusty Cassandra instance, which has been pre-populated with some sample data. Let’s first examine the application from a K8s perspective.

$ k get sts -n cassandra
NAME        READY   AGE
cassandra   3/3     73m

$ k get pods -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          73m
cassandra-1   1/1     Running   3          73m
cassandra-2   1/1     Running   3          73m

$ k get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m
cassandra-data-cassandra-1   Bound    pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m
cassandra-data-cassandra-2   Bound    pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        74m

$ k get pv | grep cassandra
pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-1   cass-sc                 74m
pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-2   cass-sc                 74m
pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-0   cass-sc                 74m

$ k exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.53  189.17 KiB  32           100.0%            b7d527a6-d465-472a-82e1-8184a924045e  Rack1-K8Demo
UN  10.244.4.23  234.35 KiB  32           100.0%            4b678dd5-92af-4003-b978-559266e07d65  Rack1-K8Demo
UN  10.244.3.28  143.58 KiB  32           100.0%            74f0a3a6-5b34-4c97-b8ea-f71589b3fbca  Rack1-K8Demo

$ k exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use demodb;
cqlsh:demodb> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000

(1 rows)
cqlsh:demodb> exit
$

Let’s now list the current volumes and snapshots from a Portworx perspective. There is a volume for each of the 3 Cassandra replicas (1GiB) and there is an additional one for my on-prem Minio volume (10GiB). Currently, there should not be any snapshots as we have not initiated any backups.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                            SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1051863986075634800     pvc-5f7d3306-758b-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
567192692874972784      pvc-9d9be9f2-758b-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
116101267951461079      pvc-a9e89927-7589-11e9-ac93-005056b82121        1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121        10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID      NAME    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS  SNAP-ENABLED

We are now ready to take our first Velero backup. In the command below, I am only going to backup the Cassandra namespace and associated objects. I am going to exclude all other K8s namespaces. There are a number of commands available to check the status of the backup.  By including a —details option to the velero backup describe command, further details about the snapshots are displayed. You could also use velero backup logs to show the call-outs to the Portworx Snapshot provider when the backup is initiating a snapshot of the Cassandra PVs.

$ velero backup create cassandra --include-namespaces cassandra
Backup request "cassandra" submitted successfully.
Run `velero backup describe cassandra` or `velero backup logs cassandra` for more details.


$ velero backup describe cassandra
Name: cassandra
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: <none>

Phase: Completed

Namespaces:
Included: cassandra
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1

Started: 2019-05-14 11:21:50 +0100 IST
Completed: 2019-05-14 11:21:55 +0100 IST

Expiration: 2019-06-13 11:21:50 +0100 IST

Persistent Volumes: 3 of 3 snapshots completed successfully (specify --details for more information)


$ velero backup describe cassandra --details
Name: cassandra
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: <none>

Phase: Completed

Namespaces:
Included: cassandra
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1

Started: 2019-05-14 11:21:50 +0100 IST
Completed: 2019-05-14 11:21:55 +0100 IST

Expiration: 2019-06-13 11:21:50 +0100 IST

Persistent Volumes:
  pvc-a9e89927-7589-11e9-ac93-005056b82121:
    Snapshot ID: 849199168767327835
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>
  pvc-5f7d3306-758b-11e9-ac93-005056b82121:
    Snapshot ID: 1019215085859062674
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>
  pvc-9d9be9f2-758b-11e9-ac93-005056b82121:
    Snapshot ID: 841348593482616373
    Type: portworx-snapshot
    Availability Zone:
    IOPS: <N/A>

And to finish off the backup part of this post, let’s do another set of Portworx commands to see some information about the volumes and snapshots. At this point, we would expect to see a snapshot for each Cassandra PV. Indeed we do, and we also see that the snapshot names include the name of the cassandra application. I’m not sure at this point where this is retrieved from, possible application label.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
1051863986075634800     pvc-5f7d3306-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
567192692874972784      pvc-9d9be9f2-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
116101267951461079      pvc-a9e89927-7589-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

Let’s now go ahead and do something drastic. Let’s delete the Cassandra namespace (which will remove the StatefulSet, PODs, PVCs, PVs, etc). We will then use Velero to restore it, and hopefully observe that our Cassandra instance comes back with our sample data.

After first deleting the Cassandra namespace, we see that this also removes the persistent volumes. This can also be seen from Portworx. However the snapshots are still intact. These are in fact the only volumes that are now listed by Portworx, and note that they are not attached to K8s worker nodes.

$ k delete ns cassandra
namespace "cassandra" deleted

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

Now it is time to do the restore of my Cassandra application using Velero. Using the —details option to the velero restore describe command does not appear to provide any additional details. However the logs can again be used to check if you had a successful restore of PVs from Portworx snapshots using velero restore logs command.

$ velero backup get
NAME        STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
cassandra   Completed   2019-05-14 11:21:50 +0100 IST   29d       default            <none>

$ velero restore create cassandra-restore --from-backup cassandra
Restore request "cassandra-restore" submitted successfully.
Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details.

$ velero restore describe cassandra-restore
Name:         cassandra-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  cassandra

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Let’s take a look at this from a Portworx perspective after the restore. The PVs are restored and attached to K8s worker nodes.

$ /opt/pwx/bin/pxctl volume list
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS                          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached                   no
960922544915702517      pvc-5f7d3306-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.64    no
180406970719671498      pvc-9d9be9f2-758b-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.27    no
305662590414763324      pvc-a9e89927-7589-11e9-ac93-005056b82121                1 GiB   3       no      no              LOW             up - attached on 10.27.51.66    no
972553017890078199      pvc-e4710be5-755a-11e9-ac93-005056b82121                10 GiB  3       no      no              LOW             up - attached on 10.27.51.66    no

$ /opt/pwx/bin/pxctl volume list --snapshot
ID                      NAME                                                    SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     STATUS          SNAP-ENABLED
1019215085859062674     cassandra_pvc-5f7d3306-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
841348593482616373      cassandra_pvc-9d9be9f2-758b-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no
849199168767327835      cassandra_pvc-a9e89927-7589-11e9-ac93-005056b82121      1 GiB   3       no      no              LOW             up - detached   no

And last but not least, let’s verify the Cassandra application is fully functional with all 3 nodes rejoined and up, and table data has been restored.

$ k get sts -n cassandra
NAME        READY   AGE
cassandra   3/3     5m22s


$ k get pods -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          5m28s
cassandra-1   1/1     Running   2          5m28s
cassandra-2   1/1     Running   2          5m28s


$ k get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s
cassandra-data-cassandra-1   Bound    pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s
cassandra-data-cassandra-2   Bound    pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            cass-sc        5m41s


$ k get svc -n cassandra
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
cassandra   ClusterIP   None         <none>        9042/TCP   5m46s


$ k get pv | grep cassandra
pvc-5f7d3306-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-1   cass-sc                 5m58s
pvc-9d9be9f2-758b-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-2   cass-sc                 5m56s
pvc-a9e89927-7589-11e9-ac93-005056b82121   1Gi        RWO            Delete           Bound    cassandra/cassandra-data-cassandra-0   cass-sc                 5m55s


$ k exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.54  142.61 KiB  32           100.0%            74f0a3a6-5b34-4c97-b8ea-f71589b3fbca  Rack1-K8Demo
UN  10.244.3.29  275.59 KiB  32           100.0%            4b678dd5-92af-4003-b978-559266e07d65  Rack1-K8Demo
UN  10.244.4.24  230.94 KiB  32           100.0%            b7d527a6-d465-472a-82e1-8184a924045e  Rack1-K8Demo


$ k exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use demodb;
cqlsh:demodb> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000

(1 rows)

cqlsh:demodb>

Everything looks good. Velero, with the Portworx plugin for snapshots, has been able to backup and restore a Cassandra instance.

Now, since these are both release candidates, you are not expected to use this for production purposes. However, if you have an opportunity to test these products in your own lab environments, I am sure both the Velero team and the Portworx plugin teams to love to get your feedback.

One comment

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.