Moving a Stateful App from VCP to CSI based Kubernetes cluster using Velero

Cormac

5 years ago

Since the release of the vSphere CSI driver in vSphere 6.7U3, I have had a number of requests about how we plan to migrate applications between Kubernetes clusters that are using the original in-tree vSphere Cloud Provider (VCP) and Kubernetes clusters that are built with the new vSphere CSI driver. All I can say at this point in time is that we are looking at ways to seamlessly achieve this at some point in the future, and that the Kubernetes community has a migration design in the works to move from in-tree providers to the new CSI driver as well.

However, I had also seen some demonstrations from the Velero team on how to use Velero for application mobility. I wanted to see if Velero could also provide us with an interim solution to move applications with persistent storage between a K8s cluster running on vSphere using the in-tree VCP and a native K8s cluster that uses the vSphere CSI driver.

Note that this method requires downtime to move the application between clusters, so the application will be offline for part of this exercise.

It should also be noted the Cassandra application that I used for demonstration purposes was idle at the time of backup (no active I/O), so that should also be taken into account.

tl;dr Yes – we can use Velero for such a scenario, in the understanding that (a) you will need resources to setup the new CSI cluster and (b) there is no seamless migration, and that the application will need to be shutdown on the VCP cluster and restarted on the CSI cluster. Here are the detailed steps.

External S3 object store

The first step is to setup an external S3 object store that can be reached by both clusters. Velero stores both metadata and (in the case of vSphere backups using restic) data in the S3 object store. In my example, I am using MinIO as I have had the most experience with that product. I have a post on how to set this up on vSphere if you want to learn more. In my lab, my VCP K8s cluster is on VLAN 50, and my CSI K8s cluster is on VLAN 51. Thus, for the CSI cluster to access the MinIO S3 object store, and thus the backup taken from the VCP cluster, I will need to re-IP my MinIO VMs to make the backup visible to the CSI cluster. More detail on that later.

VCP StorageClass

Before going any further, it is probably of interest to see how the VCP driver is currently being used. The reference to the provider/driver is placed in the StorageClass. Here is the StorageClass being used by the Cassandra application in the VCP cluster, which we will shortly be backing up and moving to a new cluster.

$ kubectl get sc
NAME PROVISIONER AGE
cass-sc kubernetes.io/vsphere-volume 64d


$ cat cassandra-sc-vcp.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: cass-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
    diskformat: thin
    storagePolicyName: raid-1
    datastore: vsanDatastore

Deploy Velero on VCP K8s cluster

At this point, the S3 object store is available on IP 192.50.0.20. It is reachable via port 9000. Thus when I deploy Velero, I have to specify this address:port combination in the s3Url and publicUrl as follows:

$ velero install  --provider aws 
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false \
--use-restic 
--backup-location-config region=minio,s3ForcePathStyle="true",\
s3Url=http://192.50.0.20:9000,publicUrl=http://192.50.0.20:9000

If you are unsure how to deploy Velero on vSphere, there is a post on the Velero blog on how to achieve this. Now you are ready to backup a stateful application. If the install is successful, you should see something similar to this at the end of the output:

Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

Prepping the Stateful Application

For the purposes of this test, I deployed a Cassandra stateful set with 3 replicas on my VCP cluster. I also populated it with some data so that we can verify that it gets successfully restored on the CSI cluster.

$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  172.16.5.2  104.38 KiB  32           66.6%             8fd5fda2-d236-4f8f-85c4-2c57eab06417  Rack1-K8Demo
UN  172.16.5.3  100.05 KiB  32           65.9%             6ebf73bb-0541-4381-b232-7f277186e2d3  Rack1-K8Demo
UN  172.16.5.4  75.93 KiB  32           67.5%             0f5387f9-149c-416d-b1b6-42b71059c2fa  Rack1-K8Demo


$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
cqlsh> use demodb;
cqlsh:demodb> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint);
cqlsh:demodb> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES (100, 'Cormac', 'Cork', 999, 1000000);
cqlsh:demodb> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000
(1 rows)

cqlsh:demodb>

Backup Cassandra using Velero

We are now ready to backup Cassandra. The first part is to annotate the volumes so that restic knows that it needs to copy the contents of these volumes as part of the backup process.

$ kubectl -n cassandra annotate pod/cassandra-2 \
backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-2 annotated

$ kubectl -n cassandra annotate pod/cassandra-1 \
backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-1 annotated

$ kubectl -n cassandra annotate pod/cassandra-0 \
backup.velero.io/backup-volumes=cassandra-data
pod/cassandra-0 annotated


$ velero backup create cassandra-pks-1010 --include-namespaces cassandra
Backup request "cassandra-pks-1010" submitted successfully.

Run `velero backup describe cassandra-pks-1010` or `velero backup logs cassandra-pks-1010` for more details.

Commands such as velero backup describe cassandra-pks-1010 and velero backup describe cassandra-pks-1010 –details can be used to monitor the backup. All going well, the backup should complete and at the end of the output from the –details command, you should observe the following.

Restic Backups:
  Completed:
    cassandra/cassandra-0: cassandra-data
    cassandra/cassandra-1: cassandra-data
    cassandra/cassandra-2: cassandra-data

If you logon to the MinIO object store, you should see a new backup called cassandra-pks-1010. You can also run a velero backup get to check on the full list of backups:

$ velero backup get
NAME                 STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
cassandra-pks-1010   Completed   2019-10-10 10:38:32 +0100 IST   29d       default            <none>
nginx-backup         Completed   2019-10-01 15:12:20 +0100 IST   21d       default            app=nginx

At this point, you may want to double-check that the application was backed up successfully by attempting to restore it to the same VCP cluster that it was backed up from. I am going to skip such a step here and move straight onto the restore part of the process.

Switch contexts to the Kubernetes CSI cluster

My current kubectl context is set to my VCP cluster. Let’s switch to the CSI cluster.

$ kubectl config get-contexts
CURRENT   NAME                CLUSTER             AUTHINFO                               NAMESPACE
*         cork8s-cluster-01   cork8s-cluster-01   d8ab6b15-f7d7-4d20-aefe-5dfe3ecbf63b
          cork8s-csi-01       kubernetes          kubernetes-admin

$ kubectl get nodes -o wide
NAME                                   STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP     OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
140ab5aa-0159-4612-b68c-df39dbea2245   Ready    <none>   68d   v1.13.5   192.168.192.5   192.168.192.5   Ubuntu 16.04.6 LTS   4.15.0-46-generic   docker://18.6.3
ebbb4c31-375b-4b17-840d-db0586dd948b   Ready    <none>   68d   v1.13.5   192.168.192.4   192.168.192.4   Ubuntu 16.04.6 LTS   4.15.0-46-generic   docker://18.6.3
fd8f9036-189f-447c-bbac-71a9fea519c0   Ready    <none>   68d   v1.13.5   192.168.192.3   192.168.192.3   Ubuntu 16.04.6 LTS   4.15.0-46-generic   docker://18.6.3

$ kubectl config use-context cork8s-csi-01
Switched to context "cork8s-csi-01".

$ kubectl get nodes -o wide
NAME          STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
k8s-master    Ready    master   51d   v1.14.2   10.27.51.39   10.27.51.39   Ubuntu 18.04.3 LTS   4.15.0-58-generic   docker://18.6.0
k8s-worker1   Ready    <none>   51d   v1.14.2   10.27.51.40   10.27.51.40   Ubuntu 18.04.3 LTS   4.15.0-58-generic   docker://18.6.0
k8s-worker2   Ready    <none>   51d   v1.14.2   10.27.51.41   10.27.51.41   Ubuntu 18.04.3 LTS   4.15.0-58-generic   docker://18.6.0

OK – at this point, I am now working with my CSI cluster. I now need to re-IP my MinIO S3 object store so that it is visible on the same VLAN as my CSI cluster. Once I can see the cluster on the new VLAN, I can now install Velero on the CSI cluster, and point it to the external S3 object store. The install command will be identical to the install on the VCP cluster, apart from the s3Url and publicUrl entries.

$ velero install  --provider aws \
--bucket velero \
--secret-file ./credentials-velero \
--use-volume-snapshots=false  \
--use-restic \
--backup-location-config region=minio,s3ForcePathStyle="true",\
s3Url=http://10.27.51.49:9000,publicUrl=http://10.27.51.49:9000

Again, as before, a successful install should result in an output as follows:

Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

Assuming the MinIO object store is setup correctly and is accessible to the CSI cluster, a velero backup get should show the backups taken on the VCP cluster, including our Cassandra backup.

$ velero backup get
NAME                 STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
cassandra-pks-1010   Completed   2019-10-10 10:38:32 +0100 IST   29d       default            <none>
nginx-backup         Completed   2019-10-01 15:12:20 +0100 IST   21d       default            app=nginx

You can also run the velero backup describe, velero backup describe –details commands that we saw earlier, ensuring that all the necessary components of the Cassandra application have been captured and are available for restore.

Restore Stateful App on the Kubernetes CSI Cluster

The first step is to make sure that there is a StorageClass which is the same name (cass-sc) as the StorageClass used on the VCP cluster. However in this CSI cluster, the StorageClass needs to reference the CSI driver rather than the VCP driver that we saw earlier.

$ kubectl get sc
NAME         PROVISIONER              AGE


$ cat cassandra-sc-csi.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: cass-sc
  annotations:
    storageclass.kubernetes.io/is-default-class:"false"
provisioner: csi.vsphere.vmware.com
parameters:
  storagepolicyname:"Space-Efficient"


$ kubectl apply -f cassandra-sc-csi.yaml
storageclass.storage.k8s.io/cass-sc created



$ kubectl get sc
NAME         PROVISIONER              AGE
cass-sc      csi.vsphere.vmware.com   3s

A restore command is quite simple – the only notable point is to specify which backup to restore. Here is the command used to restore the cassandra-pks-1010 backup on the Kubernetes CSI Cluster:

$ velero create restore cassandra --from-backup cassandra-pks-1010
Restore request "cassandra" submitted successfully.
Run `velero restore describe cassandra` or `velero restore logs cassandra` for more details.

Just like the backup commands, you can use commands such as those described in the above output to monitor the progress of the restore. Once everything has successfully restored, you should see an output similar to the following. Note the Restic Restores at the bottom of the output:

$ velero restore describe cassandra --details
Name:         cassandra
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  cassandra-pks-1010

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Restic Restores:
  Completed:
    cassandra/cassandra-0: cassandra-data
    cassandra/cassandra-1: cassandra-data
    cassandra/cassandra-2: cassandra-data

New cluster cannot access the image repository

This step may not be necessary, but you may have a situation where the new CSI cluster is unable to access the image repository of the original VCP cluster. This might happen if the original K8s cluster’s image repository (e.g. Harbor) is on a network that cannot be accessed by the new CSI cluster. If that is the case, the Cassandra application objects will restore, but the Pods will never come online due to ‘image pull’ errors. To resolve this issue, you can use kubectl to edit the Pods, and change the location of where to find the Cassandra image.

For example, let’s say that my original VCP cluster did have access to the internal Harbor repo for the images and that my new CSI cluster does not have access to the Harbor repository, but my new CSI cluster does have access to the outside world. Thus, I may want to edit my Pod image location from the internal Harbor to an external repo, e.g. from:

 image: harbor.rainpole.com/library/cassandra:v11

to:

 image: gcr.io/google-samples/cassandra:v11

To achieve this, just edit each of the Pods as follows, and make the changes to the image location:

$ kubectl edit pod cassandra-0 -n cassandra
pod/cassandra-0 edited

$ kubectl edit pod cassandra-1 -n cassandra
pod/cassandra-1 edited

$ kubectl edit pod cassandra-2 -n cassandra
pod/cassandra-2 edited

Verify that the restore is successful

Apart from verifying that the Pods, PVCs, PV, Service and StatefulSet have been restored, we should now go ahead and check the contents of the Cassandra database once it has been stood up on the CSI cluster. Let’s look at the node status first, and note that the Casandra nodes have a new range of IP addresses.

$ kubectl exec -it cassandra-0 -n cassandra -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.2.61  192.73 KiB  32           100.0%            79820874-edec-4254-b048-eaceac0ec6c8  Rack1-K8Demo
UN  10.244.2.62  157.17 KiB  32           100.0%            ea0e8ef2-aad2-47ee-ab68-14a3094da5be  Rack1-K8Demo
UN  10.244.1.58  139.97 KiB  32           100.0%            110d3212-526b-4a58-8005-ecff802d7c20  Rack1-K8Demo

Next, let’s see if the table data that we wrote whilst the application was running on the VCP cluster is persisted across the Kubernetes cluster migration.

$ kubectl exec -it cassandra-0 -n cassandra -- cqlsh
Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
cqlsh> use demodb;
cqlsh:demodb> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal
--------+----------+----------+-----------+---------
    100 |     Cork |   Cormac |       999 | 1000000
(1 rows)

cqlsh:demodb>

Nice. It looks like we have successfully moved the stateful application (Cassandra) from a K8s cluster using the original VCP driver to a K8s cluster that is using the new vSphere CSI driver. One last point in case your were wondering – yes, it also works in the other direction so you can also move stateful applications from a Kubernetes cluster using CSI on vSphere to a cluster using the VCP.