Open Source Velero Plugin for vSphere (using snapshots) in action

I recently published an article around Velero and vSAN File Services, showing how Velero and the restic plugin could be used to backup and restore Kubernetes application that used vSAN File Services. Today, I want to turn my attention to a very cool new plugin that we announced in mid-April, namely the Velero Plugin for vSphere. This open source plugin enables Velero to take a crash-consistent VADP* snapshot backup of a block Persistent Volume on vSphere storage, and store the backup on S3 compatible storage.

* VADP is short for VMware vSphere Storage APIs – Data Protection.

To utilize the plugin, I deployed the evaluation Minio S3 object store which comes with Velero, then I went ahead with the install of Velero on vSphere as per normal (which on vSphere means you install it with the AWS plugin). This is because the vSphere plugin does not provide object storage, so we are leveraging the AWS plugin and the evaluation Minio S3 object store. You can also include the restic plugin to give you some different backup options. Once Velero is installed, you simply need to install the Velero Plugin for vSphere, then create a Volume Snapshot Location (VSL). Now you are ready to use Velero Plugin for vSphere to backup and restore applications using PVs on vSphere storage.

Let’s look at the steps in more detail. Note that this is Velero version 1.3.2. The latest version of Velero, v1.4, released in late May 2020. I am not going to detail how to install the evaluation Minio S3 or how to install Velero. This is well documented in various places. I will start with the installation of the Velero Plugin for vSphere. This is very simple, as you can see in step 1 below.

Step 1 – Install the Velero Plugin for vSphere

$ velero plugin get
NAME                              KIND
velero.io/crd-remap-version       BackupItemAction
velero.io/pod                     BackupItemAction
velero.io/pv                      BackupItemAction
velero.io/service-account         BackupItemAction
velero.io/aws                     ObjectStore
velero.io/add-pv-from-pvc         RestoreItemAction
velero.io/add-pvc-from-pod        RestoreItemAction
velero.io/change-storage-class    RestoreItemAction
velero.io/cluster-role-bindings   RestoreItemAction
velero.io/crd-preserve-fields     RestoreItemAction
velero.io/job                     RestoreItemAction
velero.io/pod                     RestoreItemAction
velero.io/restic                  RestoreItemAction
velero.io/role-bindings           RestoreItemAction
velero.io/service                 RestoreItemAction
velero.io/service-account         RestoreItemAction
velero.io/aws                     VolumeSnapshotter


$ velero plugin add vsphereveleroplugin/velero-plugin-for-vsphere:1.0.0


$ velero plugin get
NAME                              KIND
velero.io/crd-remap-version       BackupItemAction
velero.io/pod                     BackupItemAction
velero.io/pv                      BackupItemAction
velero.io/service-account         BackupItemAction
velero.io/aws                     ObjectStore
velero.io/add-pv-from-pvc         RestoreItemAction
velero.io/add-pvc-from-pod        RestoreItemAction
velero.io/change-storage-class    RestoreItemAction
velero.io/cluster-role-bindings   RestoreItemAction
velero.io/crd-preserve-fields     RestoreItemAction
velero.io/job                     RestoreItemAction
velero.io/pod                     RestoreItemAction
velero.io/restic                  RestoreItemAction
velero.io/role-bindings           RestoreItemAction
velero.io/service                 RestoreItemAction
velero.io/service-account         RestoreItemAction
velero.io/aws                     VolumeSnapshotter
velero.io/vsphere                 VolumeSnapshotter

Step 2 – Create a  Volume Snapshot Location

Because Velero is deployed on vSphere, it is configured with the AWS plugin. Thus backups are stored in an S3 bucket. In this demo, I mentioned that I also deployed the evaluation Minio S3 object store. Volume Snapshot backups taken by the Velero Plugin for vSphere are stored in the same S3 bucket as the application metadata on the Minio S3 object store.

The next step is to create a  Volume Snapshot Location. This  Volume Snapshot Location is referenced when a backup is taken. The Volume Snapshot Location is used to specify the use of the Velero Plugin for vSphere.

$ velero snapshot-location create vsl-vsphere --provider velero.io/vsphere
Snapshot volume location "vsl-vsphere" configured successfully.

Step 3 – Backup using the Plugin and Volume Snapshot Location

Time to take a backup. In this test, I am backing up a complete namespace. This namespace holds a Cassandra StatefulSet which has 3 replicas. A backup will include Kubernetes objects such as Pods, Services, PVCs and PVs. The Cassandra application has a few simple populated tables. Note how the velero backup command references the VSL created in step 2. This tells the backup to use the vSphere plugin.

$ velero backup create cassandra-snap-backup --include-namespaces=cassandra \
--snapshot-volumes --volume-snapshot-locations vsl-vsphere
Backup request "cassandra-snap-backup" submitted successfully.
Run `velero backup describe cassandra-snap-backup` or `velero backup logs cassandra-snap-backup` for more details.


$ velero backup describe cassandra-snap-backup
Name:         cassandra-snap-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  InProgress

Namespaces:
  Included:  cassandra
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-05-26 11:10:58 +0100 IST
Completed:  <n/a>

Expiration:  2020-06-25 11:10:58 +0100 IST

Persistent Volumes: <none included>

Some time later …

$ velero backup get
NAME                    STATUS      CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
cassandra-snap-backup   Completed   2020-05-26 11:10:58 +0100 IST   29d       default            <none>


$ velero backup describe cassandra-snap-backup --details
Name:         cassandra-snap-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  cassandra
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  true

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2020-05-26 11:10:58 +0100 IST
Completed:  2020-05-26 11:11:32 +0100 IST

Expiration:  2020-06-25 11:10:58 +0100 IST

Resource List:
  apps/v1/ControllerRevision:
    - cassandra/cassandra-dd67c56d6
  apps/v1/StatefulSet:
    - cassandra/cassandra
  v1/Endpoints:
    - cassandra/cassandra
  v1/Namespace:
    - cassandra
  v1/PersistentVolume:
    - pvc-5802b03d-f875-4cf6-9a8c-7626856cdeef
    - pvc-98914696-db27-49f9-807b-f2d29c25cc65
    - pvc-dce81a7e-4758-4c4b-ba26-eff658e93be0
  v1/PersistentVolumeClaim:
    - cassandra/cassandra-data-cassandra-0
    - cassandra/cassandra-data-cassandra-1
    - cassandra/cassandra-data-cassandra-2
  v1/Pod:
    - cassandra/cassandra-0
    - cassandra/cassandra-1
    - cassandra/cassandra-2
  v1/Secret:
    - cassandra/default-token-ghmwc
  v1/Service:
    - cassandra/cassandra
  v1/ServiceAccount:
    - cassandra/default

Persistent Volumes:
  pvc-5802b03d-f875-4cf6-9a8c-7626856cdeef:
    Snapshot ID:        ivd:bc611f64-f318-46c6-bcdf-046d6e0f50b6:ef06b313-4593-4fca-bbca-4dd319a34e86
    Type:               ivd
    Availability Zone:
    IOPS:               <N/A>
  pvc-dce81a7e-4758-4c4b-ba26-eff658e93be0:
    Snapshot ID:        ivd:419c9163-87d0-4c59-be02-1407f2fc3ae3:a40e159c-dffe-4197-b1f5-e9e7cd7c48cc
    Type:               ivd
    Availability Zone:
    IOPS:               <N/A>
  pvc-98914696-db27-49f9-807b-f2d29c25cc65:
    Snapshot ID:        ivd:8dc53f9c-523a-43c6-b3b8-9bc3a0f3c893:4d42acdb-0e73-40ae-9c62-812e995829a5
    Type:               ivd
    Availability Zone:

At the end of the detailed output above, we can see 3 snapshots representing the backups of the 3 PVs. The IVD references is shorthand for Improved Virtual Disks, another name for First Class Disks or FCDs. More details on IVDs/FCDs can be found here and here.

During the backup, you will observe a number of events occurring in vSphere. You will seen a ‘Create virtual object snapshot’ event when the backup is initiated.

And there is a corresponding Delete virtual object snapshot’ event when the backup is completed.

Important: The data movement task, which we refer to as an upload, copies the local on-disk snapshot to the S3 object store. This operation is asynchronous to the Velero backup. Thus when the velero backup displays a Completed status, this does not mean the the upload task is completed. Administrators should use the kubectl -n velero get uploads command to monitor the upload tasks. Do not initiate any velero restore commands on the backup until all relevant upload tasks for the backup are completed.

Here is an example of such a command, along with a describe. There will be an upload per snapshot, so for my Cassandra demo, there are 3 uploads per backup. The describe option can be used to display more information about the upload.

$ kubectl -n velero get uploads
NAME                                          AGE
upload-4d42acdb-0e73-40ae-9c62-812e995829a5   22h
upload-a40e159c-dffe-4197-b1f5-e9e7cd7c48cc   22h
upload-ef06b313-4593-4fca-bbca-4dd319a34e86   22h


$ kubectl -n velero describe uploads upload-ef06b313-4593-4fca-bbca-4dd319a34e86
Name:         upload-ef06b313-4593-4fca-bbca-4dd319a34e86
Namespace:    velero
Labels:       <none>
Annotations:  <none>
API Version:  veleroplugin.io/v1
Kind:         Upload
Metadata:
  Creation Timestamp:  2020-05-26T10:11:17Z
  Generation:          3
  Resource Version:    6729495
  Self Link:           /apis/veleroplugin.io/v1/namespaces/velero/uploads/upload-ef06b313-4593-4fca-bbca-4dd319a34e86
  UID:                 39608852-903e-468d-975e-397f51ebb3db
Spec:
  Backup Timestamp:  2020-05-26T10:11:18Z
  Snapshot ID:       ivd:bc611f64-f318-46c6-bcdf-046d6e0f50b6:ef06b313-4593-4fca-bbca-4dd319a34e86
Status:
  Completion Timestamp:  2020-05-26T10:16:19Z
  Message:               Upload completed
  Next Retry Timestamp:  2020-05-26T10:11:18Z
  Phase:                 Completed
  Processing Node:       k8s2-worker-02
  Progress:
  Start Timestamp:  2020-05-26T10:11:19Z
Events:             <none>

Step 4 – Check the backups in the S3 object store

Data is stored on the S3 object store in a different way to the contents of a restic backup. There is now a separate plugin folder in the velero bucket which contains IVD snapshot data, as shown below.

Step 5 – Initiate Restore

A velero restore using the vSphere plugin is exactly the same as many restores that we have looked at before on this blog. Again, I can do something catastrophic like remove the Cassandra namespace and then try to restore it.

$ kubectl delete ns cassandra
namespace "cassandra" deleted

$ kubectl get ns
NAME              STATUS   AGE
default           Active   27d
kube-node-lease   Active   27d
kube-public       Active   27d
kube-system       Active   27d
velero            Active   3d23h
vsan-prometheus   Active   18d


$ velero restore create cassandra-restore --from-backup cassandra-snap-backup
Restore request "cassandra-restore" submitted successfully.
Run `velero restore describe cassandra-restore` or `velero restore logs cassandra-restore` for more details.


$ velero restore describe cassandra-restore
Name:         cassandra-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  InProgress

Backup:  cassandra-snap-backup

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Some time later:

$ velero restore describe cassandra-restore --details
Name:         cassandra-restore
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  Completed

Backup:  cassandra-snap-backup

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappings:  <none>

Label selector:  <none>

Restore PVs:  auto

Important: As mentioned previously, velero backup completion does not necessarily mean the backup of the snapshot data has been completed. However velero restore completion does indeed mean the restore of the snapshot data has been completed, so slightly different to backup behaviour. Restoring the snapshot from the S3 object store is referred to as a download, and just like we did with backup, we can use the kubectl -n velero get downloads command to monitor those tasks

$ kubectl -n velero get downloads
NAME                                                                               AGE
download-4d42acdb-0e73-40ae-9c62-812e995829a5-40769481-a685-4ad8-8ce5-5a0a3ff121f4 21h
download-a40e159c-dffe-4197-b1f5-e9e7cd7c48cc-9ad8a9cb-3043-433c-8111-13d923fc0df2 21h
download-ef06b313-4593-4fca-bbca-4dd319a34e86-863f2aa1-7346-4093-adb4-6191b3c2d03a 21h

$ kubectl -n velero describe downloads download-ef06b313-4593-4fca-bbca-4dd319a34e86-863f2aa1-7346-4093-adb4-6191b3c2d03a
Name:         download-ef06b313-4593-4fca-bbca-4dd319a34e86-863f2aa1-7346-4093-adb4-6191b3c2d03a
Namespace:    velero
Labels:       <none>
Annotations:  <none>
API Version:  veleroplugin.io/v1
Kind:         Download
Metadata:
  Creation Timestamp:  2020-05-26T11:23:11Z
  Generation:          3
  Resource Version:    6742468
  Self Link:           /apis/veleroplugin.io/v1/namespaces/velero/downloads/download-ef06b313-4593-4fca-bbca-4dd319a34e86-863f2aa1-7346-4093-adb4-6191b3c2d03a
  UID:                 9d2164e6-210d-4058-a6b5-de477b164cb0
Spec:
  Restore Timestamp:  2020-05-26T11:23:12Z
  Snapshot ID:        ivd:bc611f64-f318-46c6-bcdf-046d6e0f50b6:ef06b313-4593-4fca-bbca-4dd319a34e86
Status:
  Completion Timestamp:  2020-05-26T11:27:50Z
  Message:               Download completed
  Next Retry Timestamp:  2020-05-26T11:23:12Z
  Phase:                 Completed
  Processing Node:       k8s2-worker-03
  Progress:
  Start Timestamp:  2020-05-26T11:23:13Z
  Volume ID:        ivd:dbf0a76a-4634-49cf-a645-4faac2c989f0
Events:             <none>

Once the backup has completed and the downloads have completed, we can check on the application.

$ kubectl get sts -n cassandra
NAME        READY   AGE
cassandra   3/3     6m4s


$ kubectl get pvc -n cassandra
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound    pvc-5802b03d-f875-4cf6-9a8c-7626856cdeef   1Gi        RWO            cass-sc-csi    6m10s
cassandra-data-cassandra-1   Bound    pvc-dce81a7e-4758-4c4b-ba26-eff658e93be0   1Gi        RWO            cass-sc-csi    6m10s
cassandra-data-cassandra-2   Bound    pvc-98914696-db27-49f9-807b-f2d29c25cc65   1Gi        RWO            cass-sc-csi    6m10s


$ kubectl get pod -n cassandra
NAME          READY   STATUS    RESTARTS   AGE
cassandra-0   1/1     Running   0          6m14s
cassandra-1   1/1     Running   3          6m14s
cassandra-2   1/1     Running   3          6m14s

Conclusion

Success! We have backed up and restored a Cassandra application using the Velero plugin for vSphere, which in turn leverages VADP crash-consistent snapshots rather than restic file copies. The Velero plugin for vSphere is now supported with Velero 1.4.

The team would be delighted if you could try this out and provide some feedback. An FAQ with known issues is available. If you encounter issues, please add them here for review by the team.

One final note: I was able to do backups with both the vSphere plugin and restic plugin. So both can co-exist, and both can be called to backup different Kubernetes resources if necessary.