VCF 4.1.0.1 Update to VCF 4.2 – Step by Step

Cormac

3 years ago

VMware recently announced the release of VMware Cloud Foundation (VCF) version 4.2. I was particular interested in this release as it allows me to try out the new vSAN Data Persistence platform (DPp). My good pal Myles has an excellent introduction to DPp here and I plan to create a number of posts on it going forward.

My VCF 4.1.0.1 environment is what we call a Consolidated Architecture , meaning that both the management domain and workload domain run on the same infrastructure. The primary application that I run in this environment is VCF with Tanzu (vSphere with Tanzu on VCF). I have applications deployed on both the Supervisor Cluster and also on my TKG guest cluster (deployed by the TKG Service or TKGS).

The update to VCF to version 4.2 involves a number of steps, including updating SDDC Manager, updating NSX-T Edges and NSX-T Managers, updating vCenter Servers and finally updating ESXi hosts. Once the underlying infrastructure updating is taken care of, I will show how the VCF with Tanzu Control Plane can be seamlessly updated. Lastly, I will show how a TKG guest cluster can also be updated in my environment.

If you are interested in the new features available in VCF 4.2 such as vSAN HCI Mesh support and NSX-T Federation support, check out the Release Notes.

VCF automatically informs you, via the SDDC Manager, when new updates are available. Here is the notification I received to tell me that VCF 4.2 is now available. The first item that needs updating is SDDC Manager. This view is found by navigating to the Management Domain view and selecting Updates/Patches. Any new updates with be flagged.

Step 1: SDDC Manager

Before attempting any update, it is important to do a precheck. Here is the resulting precheck before attempting my update of SDDC Manager. The precheck passed, so I can proceed with the update.

The update was successful. All 7 resources associated with the SDDC Manager were updated. I can now click “Finish” and move to the next phase.

Note that you can perform a precheck before each and every step of the update. However, to save space, I am not going to do that in this post.

Step 2: Config Drift Bundle

Return to the Management Domain > Update/Patch view. In the Available Updates view, you can now select which version of Cloud Foundation you wish to update to. This is a new feature, and enables skip level updates, i.e. the ability to update your VCF environment to a later (or latest) version by skipping some earlier versions. Since I am updating to version 4.2, that is the version I select. The Available Updates automatically shows me which bundle that I should download and apply next. In this case, it is the Configuration Drift Bundle for VCF 4.2. This basically ensures that the Cloud Foundation components are compatible for upgrade. First, click the Download Now button, followed by the Update Now button once the download is complete.

After applying the Configuration Drift Bundle, you will hopefully see the following, reporting a successful update:

Step 3: NSX-T Edge & NSX-T Host Clusters

The next section updates the NSX-T Edge and NSX-T Host clusters. NSX-T plays a major role in VCF with Tanzu, providing Load Balancer functionality for the Supervisor cluster control planes, TKG “guest clusters and Kubernetes applications, as well as providing the overlay networking for Pod to Pod communication in the Supervisor cluster. This enables us to provision PodVMs for our own bespoke applications, and also enables the system to provision PodVMs for the integrated Container Image Registry provided by Harbor.

Returning to the Available Updates in the Management Domain > Updates/Patches view, the next bundle that we need to apply is NSX-T, updating from the current version of v3.0.2 to the desire version v3.1. Again, this is automatically shown once the Cloud Foundation version is chosen. Click the Download Now button, followed by the Update Now button, once the download is complete.

This step allows you some choice as to how the NSX-T Edge and NSX-T Host clusters are updated. You can do all of the NSX-T Edge clusters together in parallel, or you can choose to do them one at a time. This wizard also prompts you to update NSX-T Host clusters. Again, you can choose to do all NSX-T Host clusters together, or do them individually. Finally, you are given the option to do all updates in parallel (default) or sequentially, both for the NSX-T Edge and NSX-T Host clusters. Since my environment is a Consolidated Environment with only a single NSX-T deployment, I went with the defaults, as shown below.

Like all updates, you can monitor the progress. Again, all going well, you should observe a completed status similar to what I got at the end of my update:

So far, so good. Next component that needs updating is the NSX-T Manager.

Step 4: NSX-T Manager

Now that the NSX-T Edge and Host clusters have been successfully updated to version 3.1, it is time to update the NSX-T Manager from v3.0.1 to v3.2. This might be a little confusing, as these are referred to NSX-T Elements in the UI. As before, click the Download Now button, followed by the Update Now button once the download is complete.

Once initiated, you can observe the update activity, and indeed see that this is the update for the NSX-T Manager.

As before, all going well, it should complete the update with a status similar to the following:

That completes the update of the networking layer. All components of NSX-T are now at version v3.1. We can now turn our attention to the core vSphere components, namely vCenter server and the ESXi hosts.

Step 5: vCenter Server

The next phase in the VCF 4.2 update is to bring the vCenter Server up to version 7.0U1c. As normal, start with downloading the bundle from the Available Downloads view. Repeat the steps highlighted previously, i.e. click on the Download Now button, followed by the Update Now button once the download is complete.

The vCenter update can be monitored, as we have seen with the other products. There are quite a number of steps involved in the vCenter server update, as can be seen here:

Assuming all goes well, and there are no issues encountered with the vCenter Server update to version 7.01c, you should see the following status:

We can now turn our attention to the ESXi hosts.

Step 6: ESXi hosts

The steps should now be very familiar. Go to the Available Updates, and check what the next update recommendation is. In this case, it is ESXi version 7.01d. Click Download Now the bundle, and then click on Update Now.

The administrator has some control over this update once again. They can select individual clusters to update, or do all clusters at once. There are also some options which allow the clusters to be update in parallel or sequentially. Finally you can also chose to do a “Quick Boot” option on the ESXi hosts during the update.

As before, the steps involved in doing an update of the ESXi hosts can be monitored.

And as before, if all goes well and the update of the ESXi hosts to version 7.01d succeeds, the status should look something similar to the following:

At this point, there should be no further updates available to apply to the VCF environment. The update to VCF version 4.2 is now complete. However, I mentioned that my VCF environment’s primary use case was to run VCF with Tanzu. I also said that my goal was to update VCF to version 4.2 so that I could evaluate DPp – the new Data Persistence platform. To do this, I also need to update my vSphere with Tanzu running on VCF. Let’s do that next.

Step 7: Tanzu – Supervisor Cluster

The VCF with Tanzu update step is not integrated in the VCF SDDC Manager. You will need to connect to the “Workload Management” view in the vSphere client to initiate the update. This is the view from my environment before I launch the update. There is one update available.

Again, I only have a single cluster in my environment so there is only a single cluster that need selecting in order to apply the updates. Note that this update applies to the Supervisor cluster control plane only – it does not apply to the TKG guest clusters that have been provisioned by the TKG Service. Essentially, this will update the Supervisor cluster from v0.5 (build 16762486) to v0.6 (build 17224208). The Supervisor cluster control plane is made up of 3 virtual machines. We will see a fourth virtual machine instantiated to enable a rolling update of the control plane. This is the reason why 5 IP addresses must be allocated to the Supervisor cluster at provisioning time. 4 are used in normal operations, and the fifth is used in update scenarios like this.

The Supervisor cluster will enter a re-configuring state when the update is applied. During this time, you will observe new control plane VMs getting created, whilst the original VMs get removed.

Once again, if all goes well, the update should complete and this should be reported in the Workload Management Updates view:

The Supervisor cluster is now updated. However, there is one final (optional) task, and that is to update the TKG ‘guest’ or ‘workload’ clusters. Let’s do that as well.

Step 8: Tanzu – TKG Guest/Workload Cluster

This final step cannot be done from SDDC Manager, nor can it be done from the vSphere client. This step has to be done at the command line. We will need to use the kubectl-vsphere command to login to the cluster in question, and update the version of the distribution in the TKG cluster manifest. This will trigger a rolling update of the TKG guest/workload cluster. The steps are well documented here.

First, we will login to the Supervisor cluster, and look at the TKG guest/workload cluster from that perspective. From this context, we can examine the available distributions, and modify the distribution used by the TKG guest/workload cluster. My cluster has a single control plane node and two worker nodes. Note that the current version is v1.18.5 (as highlighted in blue) and the desired version is v1.18.10 (as highlighted in orange).

$ kubectl-vsphere login --vsphere-username administrator@vsphere.local \
--server=https://20.0.0.1 --insecure-skip-tls-verify

Password: *******
Logged in successfully.

You have access to the following contexts:
   20.0.0.1
   cormac-ns

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`


$ kubectl get tanzukubernetescluster
NAME                      CONTROL PLANE   WORKER   DISTRIBUTION                     AGE   PHASE
tkg-cluster-vcf-w-tanzu   1               2        v1.18.5+vmware.1-tkg.1.c40d30d   61d   running


$ kubectl get virtualmachines
NAME                                                     AGE
tkg-cluster-vcf-w-tanzu-control-plane-8q5vc              44m
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7klnw   44m
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7r2hs   39m


$ kubectl get virtualmachineimages
NAME                                                         VERSION                           OSTYPE
ob-15957779-photon-3-k8s-v1.16.8---vmware.1-tkg.3.60d2ffd    v1.16.8+vmware.1-tkg.3.60d2ffd    vmwarePhoton64Guest
ob-16466772-photon-3-k8s-v1.17.7---vmware.1-tkg.1.154236c    v1.17.7+vmware.1-tkg.1.154236c    vmwarePhoton64Guest
ob-16545581-photon-3-k8s-v1.16.12---vmware.1-tkg.1.da7afe7   v1.16.12+vmware.1-tkg.1.da7afe7   vmwarePhoton64Guest
ob-16551547-photon-3-k8s-v1.17.8---vmware.1-tkg.1.5417466    v1.17.8+vmware.1-tkg.1.5417466    vmwarePhoton64Guest
ob-16897056-photon-3-k8s-v1.16.14---vmware.1-tkg.1.ada4837   v1.16.14+vmware.1-tkg.1.ada4837   vmwarePhoton64Guest
ob-16924026-photon-3-k8s-v1.18.5---vmware.1-tkg.1.c40d30d    v1.18.5+vmware.1-tkg.1.c40d30d    vmwarePhoton64Guest
ob-16924027-photon-3-k8s-v1.17.11---vmware.1-tkg.1.15f1e18   v1.17.11+vmware.1-tkg.1.15f1e18   vmwarePhoton64Guest
ob-17010758-photon-3-k8s-v1.17.11---vmware.1-tkg.2.ad3d374   v1.17.11+vmware.1-tkg.2.ad3d374   vmwarePhoton64Guest
ob-17332787-photon-3-k8s-v1.17.13---vmware.1-tkg.2.2c133ed   v1.17.13+vmware.1-tkg.2.2c133ed   vmwarePhoton64Guest
ob-17419070-photon-3-k8s-v1.18.10---vmware.1-tkg.1.3a6cd48   v1.18.10+vmware.1-tkg.1.3a6cd48   vmwarePhoton64Guest

The next step is to edit the TKG guest cluster, and change the distribution from v1.18.5 to v.1.18.10. The first command shown here will open the manifest of the cluster with your default editor (e.g. vi). Next, change the distribution as documented in the link above, and save the changes. This triggers the update.

$ kubectl edit tanzukubernetescluster tkg-cluster-vcf-w-tanzu
tanzukubernetescluster.run.tanzu.vmware.com/tkg-cluster-vcf-w-tanzu edited


$ kubectl get tanzukubernetescluster
NAME                      CONTROL PLANE   WORKER   DISTRIBUTION                      AGE   PHASE
tkg-cluster-vcf-w-tanzu   1               2        v1.18.10+vmware.1-tkg.1.3a6cd48   61d   updating

If we now logout of the Supervisor context, and login to the TKG guest/workload context, we can see the changes taking effect. A new control plane node will be observed initially, and eventually all cluster nodes will be replaced with nodes using the new distribution in a rolling update fashion.

$ kubectl-vsphere logout
Your KUBECONFIG context has changed.
The current KUBECONFIG context is unset.
To change context, use `kubectl config use-context <workload name>`
Logged out of all vSphere namespaces.

$ kubectl-vsphere login --vsphere-username administrator@vsphere.local \
--server=https://20.0.0.1 --insecure-skip-tls-verify \
--tanzu-kubernetes-cluster-namespace cormac-ns \
--tanzu-kubernetes-cluster-name tkg-cluster-vcf-w-tanzu
Password:**********

Logged in successfully.

You have access to the following contexts:
   20.0.0.1
   cormac-ns
   tkg-cluster-vcf-w-tanzu

If the context you wish to use is not in this list, you may need to try
logging in again later, or contact your cluster administrator.

To change context, use `kubectl config use-context <workload name>`

$ kubectl get nodes
NAME                                                     STATUS   ROLES    AGE   VERSION
tkg-cluster-vcf-w-tanzu-control-plane-8q5vc              Ready    master   41m   v1.18.5+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7klnw   Ready    <none>   41m   v1.18.5+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7r2hs   Ready    <none>   25m   v1.18.5+vmware.1

<< -- wait some time -- >>

$ kubectl get nodes
NAME                                                     STATUS   ROLES    AGE    VERSION
tkg-cluster-vcf-w-tanzu-control-plane-8q5vc              Ready    master   44m    v1.18.5+vmware.1
tkg-cluster-vcf-w-tanzu-control-plane-drtsx              Ready    master   103s   v1.18.10+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7klnw   Ready    <none>   44m    v1.18.5+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-78f685b5c9-7r2hs   Ready    <none>   29m    v1.18.5+vmware.1

<< -- wait some time -- >>

$ kubectl get nodes
NAME                                                     STATUS   ROLES    AGE   VERSION
tkg-cluster-vcf-w-tanzu-control-plane-drtsx              Ready    master   89m   v1.18.10+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-75bc686795-79vp4   Ready    <none>   81m   v1.18.10+vmware.1
tkg-cluster-vcf-w-tanzu-workers-dxdq6-75bc686795-hqtvt   Ready    <none>   83m   v1.18.10+vmware.1

As you can see, the TKG guest/workload cluster has been successfully updated from the older version 1.18.5 to the new version 1.18.10.

My VCF with Tanzu full stack software defined data center has now been updated to the latest versions.

Step 9: vSAN Data Persistence platform (DPp)

Way back at the beginning of this post, I mentioned that the primary reason for updating this platform was to enable support for vSAN DPp, the new Data Persistence platform. Now that I am running VCF 4.2 and have updated VCF with Tanzu, I now have access to vSAN DPp. The services available in this release are S3 object stores from both Minio and Cloudian, and the Velero vSphere Operator, which enables backup and restore of objects on both the Supervisor Cluster and the TKG gust/workload cluster. You can locate them by selecting the Cluster > Configure > Supervisor Services > Services.

Over the next couple of weeks, I will stand up each of these services and show you how they work with VCF with Tanzu.