More Velero – Cassandra backup and restore

In my previous exercise with Heptio Velero, I looked at backing up and restoring a Couchbase deployment. This time I turned my attention to another popular containerized application, Cassandra. Cassandra is a NoSQL database, similar in some respects to Couchbase. Once again, I will be deploying Cassandra as a set of containers and persistent volumes from Kubernetes running on top of PKS, the Pivotal Container Service. And again, just like my last exercise, I will be instantiating the Persistent Volumes as virtual disks on top of vSAN. I’ll show you how to get Cassandra up and running quickly by sharing my YAML files, then we will destroy the namespace where Cassandra is deployed. Of course, this is after we have taken a backup with Heptio Velero (formerly Ark). We will then restore the Cassandra deployment from our Velero backup and verify that our data is still intact.

Since I went through all of the initial setup steps in my previous post, I will get straight to the Cassandra deployment, followed by the backup, restore with Velero and then verification of the data.

In my deployment, I went with 3 distinct YAML files, the service, the storage class and the statefulset. The first one shown here is the service YAML for my headless Cassandra deployment. Nothing really much to say here except for the fact that this is headless and we won’t be forwarding any traffic from the pods, thus we don’t need any cluster IP.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
  namespace: cassandra
spec:
# headless does not need a cluster IP
  clusterIP: None
  ports:
  - port: 9042
  selector:
    app: cassandra

Next up in the storage class. Regular readers will be familiar with this concept now. In a nutshell it allows us to do dynamic provisioning of volumes for our application. This storage class uses the K8s vSphere Volume Driver, consumes an SPBM policy called gold and creates virtual disks for persistent volumes on the vSAN datastore of this vSphere cluster.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: vsan
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: gold
    datastore: vsanDatastore

Lastly, we come to the stateful set itself, which allows scaling of PODS and PVs together. There are a number of things to highlight here. The first is the Cassandra container image version. These can be retrieved from gcr.io/google-samples. I went all the way back to v11 because this image included the cqlsh tool for working on the database. Now there are other options available if you choose to use later versions of the image, such as deploying a separate container with cqlsh, but I found it easier just to log onto the Cassandra containers and running my cqlsh commands from there. I’ve actually pulled down the Cassandra image and pushed it up to my own local harbor registry, which is where I am retrieving it from. One other thing is the DNS name of the Cassandra SEED node. Since I am deploying to a separate namespace called Cassandra, I need to ensure that the DNS name reflects that below. This SEED node is what allows the cluster to form. Last but not least is the volume section. This section references the storage class and allows for the creation of dynamic PVs for each POD in the Cassandra deployment, scaling in and out as needed.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  namespace: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
# My image is on my harbor registry
        image: harbor.rainpole.com/library/cassandra:v11
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 7001
          name: tls-intra-node
        - containerPort: 7199
          name: jmx
        - containerPort: 9042
          name: cql
        resources:
          limits:
            cpu: "500m"
            memory: 1Gi
          requests:
            cpu: "500m"
            memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command:
              - /bin/sh
              - -c
              - nodetool drain
        env:
          - name: MAX_HEAP_SIZE
            value: 512M
          - name: HEAP_NEWSIZE
            value: 100M
# Make sure the DNS name matches the nameserver
          - name: CASSANDRA_SEEDS
            value: "cassandra-0.cassandra.cassandra.svc.cluster.local"
          - name: CASSANDRA_CLUSTER_NAME
            value: "K8Demo"
          - name: CASSANDRA_DC
            value: "DC1-K8Demo"
          - name: CASSANDRA_RACK
            value: "Rack1-K8Demo"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        volumeMounts:
        - name: cassandra-data
          mountPath: /cassandra_data
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
# Match the annotation to the storage class name defined previously
      annotations:
        volume.beta.kubernetes.io/storage-class: vsan
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

Let’s take a look at the configuration after Cassandra has been deployed. Note that the statefulset requested 3 replicas.

cormac@pks-cli:~/Cassandra$ kubectl get sts -n cassandra
NAME        DESIRED   CURRENT   AGE
cassandra   3         3         54m

cormac@pks-cli:~/Cassandra$ kubectl get po -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          54m
cassandra-1   1/1     Running   3          54m
cassandra-2   1/1     Running   2          54m

cormac@pks-cli:~/Cassandra$ kubectl get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-c61a6e97-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m
cassandra-data-cassandra-1   Bound    pvc-c61ba5d2-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m
cassandra-data-cassandra-2   Bound    pvc-c61cadc6-4be8-11e9-be9b-005056a24d92   1Gi        RWO            vsan           54m

cormac@pks-cli:~/Cassandra$ kubectl get svc -n cassandra
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
cassandra   ClusterIP   None         <none>        9042/TCP   55m

It all looks ok from a K8s perspective. We can use this nodetool CLI tool to verify the state of the Cassandra cluster and verify that all 3 nodes have joined.

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.200.30.203  133.7  KiB  32           54.4%             a0baa626-ac99-45cc-a2f0-d45ac2f9892c  Rack1-K8Demo
UN  10.200.57.61   231.56 KiB  32           67.9%             95b1fdb8-2138-4b5d-901e-82b9b8c4b6c6  Rack1-K8Demo
UN  10.200.99.101  223.25 KiB  32           77.7%             3477bb48-ad60-4716-ac5e-9bf1f7da3f42  Rack1-K8Demo

Now we can use the cqlsh command mentioned earlier to create a dummy table and some contents (Like most of this setup, I simply picked these up from a quick google – I’m sure you can be far more elaborate should you wish).

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.

cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

cqlsh> use demodb;

cqlsh:demodb> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint);

cqlsh:demodb> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES (100, 'Cormac', 'Cork', 999, 1000000);

cqlsh:demodb> select * from emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 100000

(1 rows)
cqlsh:demodb> exit;

Next, we can start with the backup preparations, first of all annotating the Persistent Volumes so that Velero  knows to back them up.

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-2 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-2 annotated

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-1 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-1 annotated

cormac@pks-cli:~/Cassandra$ kubectl -n cassandra annotate pod/cassandra-0 backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-0 annotated

Finally, we initiate the backup. This time I am going to tell Velero to skip all the other namespaces so that it only backs up the Cassandra namespace. Note that there are various ways of doing this with selectors, etc. This isn’t necessary the most optimal way to achieve this (but it works).

cormac@pks-cli:~/Cassandra$ velero backup create cassandra --exclude-namespaces velero,default,kube-public,kube-system,pks-system,couchbase
Backup request "cassandra" submitted successfully.
Run `velero backup describe cassandra` or `velero backup logs cassandra` for more details.

I typically put a watch -n 5 before the ‘velero backup describe’ command so I can see it getting regularly updated with progress. When the backup is complete, it can be listed as follows:

cormac@pks-cli:~/Cassandra$ velero backup get
NAME         STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
all          Completed   2019-03-21 10:43:43 +0000 GMT   29d       default            <none>
all-and-cb   Completed   2019-03-21 10:51:26 +0000 GMT   29d       default            <none>
all-cb-2     Completed   2019-03-21 11:11:04 +0000 GMT   29d       default            <none>
cassandra    Completed   2019-03-21 14:43:25 +0000 GMT   29d       default            <none>

Time to see if we can restore it. As before, we can now destroy our current data. In my case, I am just going to remove the namespace where my Cassandra objects reside (PODs, PVs, service, StatefulSet), and then recover it using Velero.

cormac@pks-cli:~/Cassandra$ kubectl delete ns cassandra
namespace "cassandra" deleted

cormac@pks-cli:~/Cassandra$ velero restore create cassandra-restore --from-backup cassandra
Restore request "cassandra-restore" submitted successfully.
Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details. 

You can monitor this in the same way as you monitor the backup, using a watch -n 5. You can also monitor the creation of new namespace, PVs and PODs using kubectl. Once everything is backed up, we can verify if the data exists using the same commands as before.

cormac@pks-cli:~/Cassandra$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> select * from demodb.emp;
emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 100000
(1 rows)
cqlsh>

So we have had a successful backup and restore, using Heptio Velero, of Cassandra running as a set of containers on top in K8s on PKS, and using Persistent Volumes on vSAN – neat!