Using Tanzu Mission Control Data Protection with on-premises S3 (MinIO)

TanzuToday, we will look at another feature of Tanzu Mission Control: Data Protection. In an earlier post, we saw how Tanzu Mission Control, or TMC for short, can be used to manage and create clusters on vSphere that have Identity Management integrated with LDAP/Active Directory. We also saw how TMC managed Tanzu Kubernetes clusters on vSphere utilized the NSX ALB for Load Balancing services in that same post. Now we will deploy an S3 Object Store from MinIO to an on-premises Tanzu Kubernetes cluster. This will then become the “backup target” for TMC Data Protection. TMC Data Protection uses the Velero product from VMware. Once configured, we will initiate a simple backup and restore operation from Tanzu Mission Control to ensure that everything is working as expected.

This is an opportune post and I just noticed some new TMC enhancements around Data Protection in the latest release notes. It seems that the data protection features of Tanzu Mission Control now allow you to restore a selected namespace from a cluster backup, as well as allowing you to restore without overwriting an existing namespace. These are some nice new features for sure.

While there is not too much text associated with this post, there are quite a number of screenshots, so it is a bit on a lengthy post. Hope you find it useful all the same.

Step 1: Deploy MinIO Operator

MinIOI have used an on-premises, Tanzu Kubernetes cluster to host a MinIO S3 Object Store running on its own dedicated Kubernetes cluster. This is deploying onto vSphere 7.0U2, and uses an NSX ALB for load balancer services. Note that there are 4 worker nodes. This appears to be a MinIO requirement for minimal erasure coding when creating a tenant, as we will see later.

I used the new MinIO Operator for this deployment. It has two methods to get started, and I have found the krew method the easier, perhaps because I am already familiar with it. Once the MinIO Operator is installed, the command kubectl minio is available to interact with the operator. We can use the operator to initialize MinIO on the cluster, and then create our first MinIO tenant with its own S3 Object Store.

% kubectl minio
Deploy and manage the multi tenant, S3 API compatible object storage on Kubernetes

Usage:
  minio [command]

Available Commands:
  init        Initialize MinIO Operator
  tenant      Manage MinIO tenant(s)
  delete      Delete MinIO Operator
  proxy      Open a port-forward to Console UI
  version    Display plugin version

Flags:
  -h, --help                help for minio
      --kubeconfig string  Custom kubeconfig path

Use "minio [command] --help" for more information about a command.
%

To install the Minio, ensure that the Kubernetes configuration context is pointing to your Minio Kubernetes cluster, and then run the following command.

% kubectl minio init
namespace/minio-operator created
serviceaccount/minio-operator created
clusterrole.rbac.authorization.k8s.io/minio-operator-role created
clusterrolebinding.rbac.authorization.k8s.io/minio-operator-binding created
customresourcedefinition.apiextensions.k8s.io/tenants.minio.min.io created
service/operator created
deployment.apps/minio-operator created
serviceaccount/console-sa created
clusterrole.rbac.authorization.k8s.io/console-sa-role created
clusterrolebinding.rbac.authorization.k8s.io/console-sa-binding created
configmap/console-env created
service/console created
deployment.apps/console created
-----------------

To open Operator UI, start a port forward using this command: 

kubectl minio proxy -n minio-operator

-----------------
%

Notice the directive at the end of the previous command output to open the Operator UI. We will do that next. First though, verify that the operator is successfully deployed by checking the status of the following pods.

% kubectl get pods -n minio-operator
NAME                              READY  STATUS    RESTARTS   AGE
console-59d589b9b7-mfqp8          1/1    Running   0          2m57s
minio-operator-6447586784-rk4mv   1/1    Running   0          3m1s
%

Step 2: Create the first MinIO tenant

The init output previously states that to open the Operator UI, start a port forward using the following command. Although it is possible to create tenants from the kubectl minio tenant command, we will use the UI as it seems to be a bit more intuitive, and there are some advanced options we need to configure. Let’s run the port forward.

% kubectl minio proxy -n minio-operator
Starting port forward of the Console UI.

To connect open a browser and go to http://localhost:9090
Current JWT to login: eyJhbGciOiJ......weZPjNNbQ

Forwarding from 0.0.0.0:9090 -> 9090

From a browser on our desktop, connect to http://localhost:9090 to launch the Operator UI. JWT is the JSON Web Token that we will use to login to the UI. It has been truncated/obfuscated in the above output.

MinIO Operator JWT Login

Copy and paste the JWT from the console output to the Operator Login. On initial login, there are no existing tenants. Click on the + to add a new one which we will use for Velero backups.

Default View - No Tenant - MinIO Operator

When creating a new tenant, you need to provide a name, a namespace (click + to add a new one if it does not already exist). You must also provide a storage class (I just chose the default that was already configured on the TKG cluster where MinIO is deployed. You might also wish to reduce the total size of the disk drives (default is 100Gi). I have reduced this to 10Gi for my testing. There is a requirement to have a minimum of 4 servers, which is met by having 4 worker nodes in the TKG cluster.

MinIO Operator Tenant Setup

The remaining settings can be left at their defaults apart from Security. I have disabled TLS (Transport Layer Security) on this tenant. This is fine for a proof-of-concept. For production, TLS should be enabled to establish trust between Velero and the MinIO S3 Object Store. The official Velero documentation describes what steps are needed to enable Velero and the S3 Object Store to establish trust through self-signed certs. This is something that I believe is being scoped to include in the Tanzu Mission Control interface as some point in the future, but at present it is a manual step. Thus, we will continue to TLS disabled.

MinIO Operator - Teantn Review

This is a review of the configuration I created of the new tenant. We can go ahead now create it.

MinIO Operator - Tenant Review

When the tenant is successfully created, you are provided with an Access Key and a Secret Key to access it via a pop-up window. Store these safely as we will reuse them in the remaining exercise. You can monitor the creation of the tenant. Click on the tenant name (velero) to see more information.

Minio Operator - Tenant Creating

Eventually the Summary view should show the tenant as healthy. It also provides a Console link and the Minio Endpoint. These Load Balancer IP Addresses come from the NSX ALB in this environment. We can use the Console link to manage the tenant, whilst the endpoint is what Velero uses as a target to ship its backup information.

MinIO Tenant Health

Lets take a quick look at the Kubernetes cluster where MinIO is running. Some new objects should now be visible in the chosen namespace, in our case velero. Note that in the first output, listing the services, there are Load Balancer / External IP addresses for both the minio endpoint and the console. These match what we see in the summary details above.

% kubectl get svc -n velero
NAME                    TYPE          CLUSTER-IP    EXTERNAL-IP      PORT(S)          AGE
minio                  LoadBalancer  10.96.91.2     xx.yy.zz.154     80:31365/TCP    2m11s
velero-console         LoadBalancer  10.96.81.5     xx.yy.zz.155     9090:30882/TCP  2m10s
velero-hl              ClusterIP      None          <none>          9000/TCP        2m10s
velero-log-hl-svc      ClusterIP      None          <none>          5432/TCP        45s
velero-log-search-api  ClusterIP      10.96.15.197  <none>          8080/TCP        45s


% kubectl get pods -n velero
NAME                                   READY  STATUS     RESTARTS   AGE
velero-log-0                           1/1    Running    0          79s
velero-log-search-api-dc985c94b-l4szs  1/1    Running    3          79s
velero-pool-0-0                        1/1    Running    0          2m43s
velero-pool-0-1                        1/1    Running    0          2m43s
velero-pool-0-2                        1/1    Running    0          2m43s
velero-pool-0-3                        1/1    Running    0          2m43s
velero-prometheus-0                    0/2    Init:0/1   0          19s


% kubectl get deploy -n velero
NAME                    READY  UP-TO-DATE  AVAILABLE  AGE
velero-log-search-api  1/1    1            1          87s


% kubectl get rs -n velero
NAME                              DESIRED  CURRENT  READY  AGE
velero-log-search-api-dc985c94b  1        1        1      92s


% kubectl get sts -n velero
NAME                READY  AGE
velero-log          1/1    98s
velero-pool-0       4/4    3m2s

We can also use the MinIO Operator to examine the tenant configuration.

% kubectl minio tenant list

Tenant 'velero', Namespace 'velero', Total capacity 10 GiB

 Current status: Initialized
 MinIO version: minio/minio:RELEASE.2021-10-06T23-36-31Z


% kubectl minio tenant info velero -n velero

Tenant 'velero', Namespace 'velero', Total capacity 10 GiB

 Current status: Initialized
 MinIO version: minio/minio:RELEASE.2021-10-06T23-36-31Z
 MinIO service: minio/ClusterIP (port 80)

 Console service: velero-console/ClusterIP (port 9090)

+------+---------+--------------------+---------------------+
| POOL | SERVERS | VOLUMES PER SERVER | CAPACITY PER VOLUME |
+------+---------+--------------------+---------------------+
| 0    | 4       | 1                  | 2684354560          |
+------+---------+--------------------+---------------------+

Step 3: Create a bucket for the first Minio tenant

Let’s connect to the MinIO Console of the tenant using the Load Balancer address rather than the port forward for the MinIO Operator used previously. Login with the same Access Key and Secrey Key used previously, which you safely stored. Once logged in, select our new tenant and create an S3 bucket for the Velero backups.

MinIO Console

As you can see, there is nothing yet on the dashboard that is of much interest. Click on the bucket icon on the left hand side of the window.

MinIO Tenant - Console Dashboard

Provide the bucket with a name, in this case velero-backups, and click Save.

Minio Tenant - Bucket Creation

The tenant now has a bucket.

Minio Tenant Bucket

Now that the S3 Object Store / backup target is successfully deployed and configured, we can turn our attention to Tanzu Mission Control, and begin the process of setting up Data Protection for the various managed clusters and their objects.

Step 4: Configure Tanzu Mission Control Data Protection

There are 3 steps that need completing in order to setup Tanzu Mission Control with Data Protection. We can  summarize these as:

  1. Create the backup credentials for the S3 Object Store
  2. Add the S3 Object Store as a backup target
  3. Enable Data Protection on the cluster(s)

Once we complete these steps, we will be able to take a backup of a cluster, or some cluster objects, via Tanzu Mission Control. Let’s delve in each of these steps in turn.

4.1 Create an S3 Account Credential

To create an Account Credential for the MinIO S3 Object Store, from TMC, select Administration, and in Accounts select Customer provisioned storage credential S3.

Customer provisioned storage credential S3

Give the credential a name that is easily recognizable. I have added the Access Key and Secret Key provided previously.

S3 credentials

The new account credential should now be visible.

account creds created

4.2 Create a backup Target Location

In TMC, again select Administration, and then Create Target locations. Choose Customer provisioned S3-compatible storage.

S3-compatible object storage

Select the account credentials for this object store, which you just created in the previous step.

s3 object store - credential selection

Next provide a URL for the S3 compatible object store. This is the MinIO Endpoint from the MinIO UI tenant summary in step 2. This is not the Console. Note that it takes a URL, not an IP address.

TMC Data Protection storage provider

Decide who can have access to the target. In this example, I am providing access to all of my clusters in the cluster group called cormac-cluster-group.

TMC data protection - allow cluster groups

Give the target location a name and create it.

TMC Data Protection - name and create target location

The TMC Data Protection target location is now created.

TMC Data Protection Target Location

Now that we have a target location for backups, we should now be able to setup Data Protection on the clusters.

4.3 Enable Data Protection

To enable Data Protection on a cluster in TMC, select the cluster on which you want to initiate the backup. Then click on “Enable Data protection“. This is available in the lower right hand side of the cluster view. By default, Data protection is not enabled.

TMC Data Protection is not enabled

This will launch a popup window, as shown below. As it states, this will install Velero on your cluster.

Enable data protection

If the operation is successful, it should show that Data Protection is now enabled on your cluster, and the option to Create backup is now available.

TMC data protection is now enabled

Once data protection is enabled you should be able to see 2 new pods in the velero namespace created on the target cluster that is being backed up (not the cluster where MinIO Operator is running). To see these additional pods, you need to switch the Kubernetes configuration context from where MinIO is installed (which is what we were looking at previously) to the target cluster that is being backed-up. A useful tip for troubleshooting in my experience is to follow the logs of the velero pod on the target cluster to make sure everything is working. We will look at the logs again, when we do a backup.

% kubectl config get-contexts
CURRENT  NAME                              CLUSTER      AUTHINFO              NAMESPACE
          kubernetes-admin@kubernetes      kubernetes   kubernetes-admin
          mgmt-cjh-admin@mgmt-cjh          mgmt-cjh     mgmt-cjh-admin
*         minio-admin@minio                minio        minio-admin
          tanzu-cli-workload1@workload1    workload1    tanzu-cli-workload1
          tanzu-cli-workload2@workload2    workload2    tanzu-cli-workload2
          tanzu-cli-workload3a@workload3a  workload3a   tanzu-cli-workload3a
          workload1-admin@workload1        workload1    workload1-admin
          workload2-admin@workload2        workload2    workload2-admin
          workload3a-admin@workload3a      workload3a   workload3a-admin


% kubectl config use-context workload3a-admin@workload3a
Switched to context "workload3a-admin@workload3a".


% kubectl get pods -n velero
NAME                      READY  STATUS    RESTARTS  AGE
restic-zs92t              1/1    Running  0          54s
velero-6f4fb865b4-v8wbx   1/1    Running  0          57s


% kubectl logs velero-6f4fb865b4-v8wbx -n velero -f
time="2021-10-08T11:49:46Z" level=info msg="setting log-level to INFO" logSource="pkg/cmd/server/server.go:172"
time="2021-10-08T11:49:46Z" level=info msg="Starting Velero server v1.6.2 (8c9cdb9603446760452979dc77f93b17054ea1cc)" logSource="pkg/cmd/server/server.go:174"
time="2021-10-08T11:49:46Z" level=info msg="No feature flags enabled" logSource="pkg/cmd/server/server.go:178"
.
.
.

Step 5: Take your first backup

For my first backup, I am going to backup a namespace in my cluster which contains a number of object that provide an nginx web server. This shows the namespace, and some of the associated Kubernetes objects in that namespace.

% kubectl get ns
NAME                    STATUS  AGE
avi-system              Active  45h
cormac-ns               Active  108m
default                 Active  45h
kube-node-lease         Active  45h
kube-public             Active  45h
kube-system             Active  45h
pinniped-concierge      Active  45h
pinniped-supervisor     Active  45h
registry-creds-system   Active  24h
tkg-system              Active  45h
tkg-system-public       Active  45h
tmc-data-protection     Active  24h
velero                  Active  43m
vmware-system-tmc       Active  45h


% kubectl get all -n cormac-ns
NAME                                  READY  STATUS    RESTARTS  AGE
pod/nginx-deployment-585449566-nv24z  1/1    Running  0          106m

NAME                TYPE          CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
service/nginx-svc   LoadBalancer  10.96.164.65  xx.yy.zz.157   443:31234/TCP,80:31076/TCP  106m
 
NAME                              READY  UP-TO-DATE  AVAILABLE  AGE
deployment.apps/nginx-deployment  1/1    1            1          106m

NAME                                        DESIRED  CURRENT  READY  AGE
replicaset.apps/nginx-deployment-585449566  1        1        1      106m

And since this is a virtual / load balancer IP address, I can connect to it and see the nginx landing page.

Nginx landing page

In the Data Protection window in the cluster view, click “Create Backup”.  This launches the backup wizard. First step is to decide what to back up. I am only backing up a namespace, so that is what is selected.

TMC - what to backup

Select the target location, which we just created (our Minio S3 object store).

where to store the backup

Select a schedule. I am just going to do a one-off backup, right now.

when to backup

Decide how long you want the backup to be retained. By default it is 30 days.

Back up retention

Give the backup job/schedule a name, and create the backup job.

Name and create

If everything is working, the backup should complete successfully.

backup completed successfully

If there are any issues, monitoring the logs of the velero pod as shown earlier. This may reveal any potential configuration issues with the backup target.

% kubectl logs velero-6f4fb865b4-v8wbx -n velero -f
time="2021-10-08T11:49:46Z" level=info msg="setting log-level to INFO" logSource="pkg/cmd/server/server.go:172"
time="2021-10-08T11:49:46Z" level=info msg="Starting Velero server v1.6.2 (8c9cdb9603446760452979dc77f93b17054ea1cc)" logSource="pkg/cmd/server/server.go:174"
time="2021-10-08T11:49:46Z" level=info msg="No feature flags enabled" logSource="pkg/cmd/server/server.go:178"
.
.
.
time="2021-10-08T11:59:54Z" level=info msg="Setting up backup log" backup=velero/cluster3-web-server-backup controller=backup logSource="pkg/controller/backup_controller.go:534"
time="2021-10-08T11:59:54Z" level=info msg="Setting up backup temp file" backup=velero/cluster3-web-server-backup logSource="pkg/controller/backup_controller.go:556"
time="2021-10-08T11:59:54Z" level=info msg="Setting up plugin manager" backup=velero/cluster3-web-server-backup logSource="pkg/controller/backup_controller.go:563"
time="2021-10-08T11:59:54Z" level=info msg="Getting backup item actions" backup=velero/cluster3-web-server-backup logSource="pkg/controller/backup_controller.go:567"
time="2021-10-08T11:59:54Z" level=info msg="Setting up backup store to check for backup existence" backup=velero/cluster3-web-server-backup logSource="pkg/controller/backup_controller.go:573"
time="2021-10-08T11:59:54Z" level=info msg="Writing backup version file" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:212"
time="2021-10-08T11:59:54Z" level=info msg="Including namespaces: cormac-ns" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:218"
time="2021-10-08T11:59:54Z" level=info msg="Excluding namespaces: <none>" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:219"
time="2021-10-08T11:59:54Z" level=info msg="Including resources: *" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:222"
time="2021-10-08T11:59:54Z" level=info msg="Excluding resources: <none>" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:223"
time="2021-10-08T11:59:54Z" level=info msg="Backing up all pod volumes using restic: false" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:224"
.
.
.
time="2021-10-08T12:00:03Z" level=info msg="Processing item" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:354" name=nginx-svc-9cbg2 namespace=cormac-ns progress= resource=endpointslices.discovery.k8s.io
time="2021-10-08T12:00:03Z" level=info msg="Backing up item" backup=velero/cluster3-web-server-backup logSource="pkg/backup/item_backupper.go:121" name=nginx-svc-9cbg2 namespace=cormac-ns resource=endpointslices.discovery.k8s.io
time="2021-10-08T12:00:03Z" level=info msg="Backed up 11 items out of an estimated total of 11 (estimate will change throughout the backup)" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:394" name=nginx-svc-9cbg2 namespace=cormac-ns progress= resource=endpointslices.discovery.k8s.io
time="2021-10-08T12:00:03Z" level=info msg="Backed up a total of 11 items" backup=velero/cluster3-web-server-backup logSource="pkg/backup/backup.go:419" progress=
time="2021-10-08T12:00:03Z" level=info msg="Setting up backup store to persist the backup" backup=velero/cluster3-web-server-backup logSource="pkg/controller/backup_controller.go:654"
time="2021-10-08T12:00:03Z" level=info msg="Backup completed" backup=velero/cluster3-web-server-backup controller=backup logSource="pkg/controller/backup_controller.go:664"
.
.

Step 6: Check backup contents in MinIO S3 Object Store

If we now logon to the MinIO console for our tenant (see step 2), the tenant dashboard should show some more interesting information now that we have sent it some backup contents.

Minio Tenant Dashboard after backup

And if we look at the velero backup bucket, we can see that it has some objects stored in it.

Bucket after backup

These objects are of course coming from the velero (TMC Data Protection) backup job, and are the contents of the namespace where Nginx was running.

velero backup job contents

Step 7: Restore to ensure everything is working

No backup should be deemed good unless it can be successfully restored. To do a real-life restore of the backup, let’s delete the namespace that I just backed up. Note that TMC is going to recreate the namespace because it was created via TMC in the first place, but the recreated namespace will be empty. I will then restore the backup contents from my MinIO S3 bucket, and fincally check if the nginx web server app is working once again.

% kubectl get ns
NAME                    STATUS  AGE
avi-system              Active  45h
cormac-ns               Active  118m
default                 Active  45h
kube-node-lease         Active  45h
kube-public             Active  45h
kube-system             Active  45h
pinniped-concierge      Active  45h
pinniped-supervisor     Active  45h
registry-creds-system   Active  24h
tkg-system              Active  45h
tkg-system-public       Active  45h
tmc-data-protection     Active  24h
velero                  Active  43m
vmware-system-tmc       Active  45h


% kubectl get all -n cormac-ns
NAME                                  READY  STATUS    RESTARTS  AGE
pod/nginx-deployment-585449566-nv24z  1/1    Running  0          106m

NAME                TYPE          CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
service/nginx-svc  LoadBalancer  10.96.164.65   xx.yy.zz.157     443:31234/TCP,80:31076/TCP   106m

NAME                              READY  UP-TO-DATE  AVAILABLE  AGE
deployment.apps/nginx-deployment  1/1    1            1          106m
 
NAME                                        DESIRED  CURRENT  READY  AGE
replicaset.apps/nginx-deployment-585449566  1        1        1      106m

 
% kubectl delete ns cormac-ns
namespace "cormac-ns" deleted
 

% kubectl get all -n cormac-ns
No resources found in cormac-ns namespace.


% kubectl get ns
NAME                    STATUS  AGE
avi-system              Active  45h
cormac-ns               Active  9s        <<< namespace is recreated, but it is empty
default                 Active  45h
kube-node-lease         Active  45h
kube-public             Active  45h
kube-system             Active  45h
pinniped-concierge      Active  45h
pinniped-supervisor     Active  45h
registry-creds-system   Active  24h
tkg-system              Active  45h
tkg-system-public       Active  45h
tmc-data-protection     Active  24h
velero                  Active  43m
vmware-system-tmc       Active  45h


% kubectl get all -n cormac-ns
No resources found in cormac-ns namespace.

With my namespace now empty, and my web server down, let’s do a restore. In the cluster where the backup was initiated, select the backup. Then click on the RESTORE button/link.

TMC Data Protection Restore

Decide what you want to restore. I want to restore everything in that backup – the entire backup job. Note the option to restore selected namespace, which I mentioned in the beginning of this post. This is a new feature that allows you to restore individual namespaces from a full cluster backup.

restore entire backup job

Give the restore job a name and start the restore process.

restore name

You should hopefully observe a successful restore.

successful restore

Let’s check the namespace to see if the contents were restored. Looks good.

% kubectl get ns
NAME                    STATUS  AGE
avi-system              Active  45h
cormac-ns               Active  7m56s
default                 Active  45h
kube-node-lease         Active  45h
kube-public             Active  45h
kube-system             Active  45h
pinniped-concierge      Active  45h
pinniped-supervisor     Active  45h
registry-creds-system   Active  24h
tkg-system              Active  45h
tkg-system-public       Active  45h
tmc-data-protection     Active  24h
velero                  Active  51m
vmware-system-tmc       Active  45h

 

% kubectl get all -n cormac-ns
NAME                                  READY  STATUS    RESTARTS  AGE
pod/nginx-deployment-585449566-nv24z  1/1    Running  0          61s

NAME                TYPE          CLUSTER-IP    EXTERNAL-IP      PORT(S)                      AGE
service/nginx-svc  LoadBalancer  10.96.83.165   xx.yy.zz.157     443:30368/TCP,80:32025/TCP   62s

NAME                              READY  UP-TO-DATE  AVAILABLE  AGE
deployment.apps/nginx-deployment  1/1    1            1          62s

NAME                                        DESIRED  CURRENT  READY  AGE
replicaset.apps/nginx-deployment-585449566  1        1        1      62s

And is the application working once again? – yes it is!

Nginx landing page

That completes the overview of using Data Protection (through Velero) on Tanzu Mission Control, using an on-premises MinIO S3 Object Store running in a Tanzu Kubernetes cluster as the backup target/destination.