A closer look at Cluster API and TKG v1.3.1

In this post, I am going to take a look at Cluster API, and then take a look at some of the changes made to TKG v.1.3.1. TKG uses Cluster API extensively to create workload Kubernetes clusters, so we will be able to apply what we see from the first part of this post to TKG in the second part. There is already an extensive amount of information and documentation available on Cluster API, so I am not going to cover every aspect of it here. This link will take you to the Cluster API concepts, which discusses all the different Custom Resources that are added to a cluster to support Cluster API. This one links to the Cluster API quick start guide to help you get started in setting up Cluster API on your own.

As its name implies , the purpose of Cluster API is to enable cluster operations in K8s, such as create, configure, manage, monitor and delete.  Cluster API has two cluster concepts – there is the management cluster, which is a Kubernetes cluster which has had Cluster API components, including one or more infrastructure providers, installed and this management cluster is now responsible for the lifecycle of workload clusters. A workload cluster is simply a K8s cluster which is provisioned and managed by a management cluster.

To see Cluster API in operation, you need a K8s cluster. For simplicity, we will create a simple Kubernetes in Docker (kind) cluster, then install the clusterctl binary, and finally initialize the kind cluster will the appropriate Cluster API components so that it can assume the role of management cluster, and enable us to deploy one or more workload clusters.

Part 1: Step 1 – Create a kind cluster

I am not going to show how to install the kind binary. Essentially you need Docker installed, since kind deploys Kubernetes in Docker Containers. There is a kind quickstart available here. Once Docker is running and kind is installed, you can create a Kubernetes cluster using kind as follows:

$ kind create cluster
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.21.1) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Have a nice day! 👋
$

You can switch context as per the kubectl command in the output above and check the running pods:

$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-558bd4d5db-lfnth 1/1 Running 0 24s
kube-system coredns-558bd4d5db-rkktw 1/1 Running 0 24s
kube-system etcd-kind-control-plane 1/1 Running 0 28s
kube-system kindnet-h2brs 1/1 Running 0 25s
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 28s
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 28s
kube-system kube-proxy-vht8q 1/1 Running 0 25s
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 40s
local-path-storage local-path-provisioner-547f784dff-zgqn5 1/1 Running 0 24s

Part 1: Step 2 – Download the clusterctl binary

The clusterctl tool is how we enable Cluster API, and specifically how we create the management cluster. It can be downloaded from GitHub. Here is a way to download version v0.3.17, as per the Cluster API quick start guide.

$ curl -L https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.3.17/clusterctl-linux-amd64 \
-o clusterctl

With the clusterctl binary now available, mode changed to executable and available in our $PATH, we can proceed with the creation of the management cluster.

Part 1: Step 3 – Initialize the Cluster API management cluster

The command clusterctl init automatically adds the Cluster API core provider, the kubeadm bootstrap provider and the kubeadm control-plane provider. When running on top of vSphere, the init commands takes the –infrastructure vsphere option to include the vSphere provider (which requires VSPHERE_USERNAME and VSPHERE_PASSWORD environment variables set). This will proceed to add the 4 providers to the kind cluster (which is the current kubeconfig context). A sample script to initialize the management cluster, and which includes the required environment variables, can be found at this GitHub repo. Here is an example of how it is run, assuming all the required environment variables have been previously exported.

$ kubectl config get-contexts
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* kind-kind kind-kind kind-kind


$ clusterctl init --infrastructure vsphere
Fetching providers

Installing cert-manager Version="v1.1.0"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v0.3.17" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v0.3.17" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v0.3.17" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-vsphere" Version="v0.7.7" TargetNamespace="capv-system"

Your management cluster has been initialized successfully!

You can now create your first workload cluster by running the following:

clusterctl config cluster [name] --kubernetes-version [version] | kubectl apply -f -

This creates a number of additional objects in the kind cluster, corresponding to the providers, as shown below. I am only showing the pod listing to give you an idea of what has been added. We can see pods for all 4 providers, each placed in their own namespaces.

$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-f98476fd6-vtzm9 2/2 Running 0 47s
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-86f749cf89-lk7dc 2/2 Running 0 43s
capi-system capi-controller-manager-58f797cb65-cxmwd 2/2 Running 0 50s
capi-webhook-system capi-controller-manager-7d596cc4cb-jrdfs 2/2 Running 0 52s
capi-webhook-system capi-kubeadm-bootstrap-controller-manager-5499744c7c-nvw5j 2/2 Running 0 49s
capi-webhook-system capi-kubeadm-control-plane-controller-manager-5869f67c96-9c9p7 2/2 Running 0 45s
capi-webhook-system capv-controller-manager-66ffbd8dfc-p9ctb 2/2 Running 0 41s
capv-system capv-controller-manager-bfbb4c968-mq5hv 2/2 Running 0 39s
cert-manager cert-manager-86cb5dcfdd-bsthf 1/1 Running 0 71s
cert-manager cert-manager-cainjector-84cf775b89-vvdqc 1/1 Running 0 71s
cert-manager cert-manager-webhook-7f9f4f8dcb-pr75w 1/1 Running 0 71s
kube-system coredns-558bd4d5db-5g95k 1/1 Running 0 4m26s
kube-system coredns-558bd4d5db-hhzbs 1/1 Running 0 4m26s
kube-system etcd-kind-control-plane 1/1 Running 0 4m43s
kube-system kindnet-x7xqh 1/1 Running 0 4m26s
kube-system kube-apiserver-kind-control-plane 1/1 Running 0 4m30s
kube-system kube-controller-manager-kind-control-plane 1/1 Running 0 4m30s
kube-system kube-proxy-szwrd 1/1 Running 0 4m26s
kube-system kube-scheduler-kind-control-plane 1/1 Running 0 4m43s
local-path-storage local-path-provisioner-547f784dff-j69rp 1/1 Running 0 4m26s

The default Cluster API provider uses kubeadm to bootstrap the control plane. kubeadm is a part of Kubernetes, and its purpose is to create Kubernetes clusters. The kubeadm bootstrap provider generates cluster certificates, and initializes the control plane. It waits until the control plane initialization is complete before creating other nodes (e.g. worker nodes) and joining them to the cluster. The control plane initialization is machine-based which means that (in the case of vSphere at least), virtual machines are provisioned to build the create plane, and these is turn are used to create the  static pods for Kubernetes services such as kube-apiserver, kube-controller-manager and kube-scheduler. Further details on how all of this ties together can be found in the Cluster API book’s concepts guide.

Everything is now in place to allow us to build a workload cluster.

Part 1: Step 4 – Create a workload cluster

Before creating our first workload cluster, a significant number of additional environment variables need to be configured. You can find the full list in the cluster-api-provider-vsphere documentation, but it will require variables that provide the IP address of the vCenter server, vCenter server credentials and a number of inventory items such as Datacenter and Datastore. Once the environment variables have been populated, we can now specify that we wish to create a new workload cluster. In this example, the cluster is called vsphere-quickstart with 1 control plane node and 3 worker nodes, as per the Cluster API quick start guide. The K8s version of v1.18.6 matches the VM templates referred to by the VSPHERE_TEMPLATE  environment variable. This in turn refer to a templated VM in the vSphere inventory which must be uploaded in advance, and are used for creating the actual VMs that will provide the Kubernetes workload cluster control plane and worker nodes. A sample script to create a workload cluster, and which includes the required environment variables, can be found at this GitHub repo.

This command creates a single manifest (YAML) output that contains all of the objects to build a new workload cluster, such as Cluster, KubeadmControlPlane, vSphereCluster, vSphereMachineTemplate, MachineDeployment and so on. Here is an example of how it is run, assuming all the required environment variables have been previously exported.

$ clusterctl config cluster vsphere-quickstart --infrastructure vsphere \
--kubernetes-version v1.18.6 --control-plane-machine-count 1 --worker-machine-count 3 > cluster.yaml


$ kubectl apply -f cluster.yaml
cluster.cluster.x-k8s.io/vsphere-quickstart created
vspherecluster.infrastructure.cluster.x-k8s.io/vsphere-quickstart created
vspheremachinetemplate.infrastructure.cluster.x-k8s.io/vsphere-quickstart created
kubeadmcontrolplane.controlplane.cluster.x-k8s.io/vsphere-quickstart created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/vsphere-quickstart-md-0 created
machinedeployment.cluster.x-k8s.io/vsphere-quickstart-md-0 created
secret/vsphere-csi-controller created
configmap/vsphere-csi-controller-role created
configmap/vsphere-csi-controller-binding created
secret/csi-vsphere-config created
configmap/csi.vsphere.vmware.com created
configmap/vsphere-csi-node created
configmap/vsphere-csi-controller created
configmap/internal-feature-states.csi.vsphere.vmware.com created

This will begin the process of deploying 4 new VMs on your vSphere infrastructure, 1 for the control plane node and 3 for the worker node node. The size and resource usage of these VMs is the same as the template VSPHERE_TEMPLATE. These VMs will begin to appear in the vSphere inventory

Part 1: Step 5 – Apply a CNI

After a few moments, the VMs should be deployed and the workload cluster formed. We can access and use the workload context as follows to display the list of nodes in the workload cluster.

$ clusterctl get kubeconfig vsphere-quickstart > quickstart-kubeconfig

$ kubectl get nodes --kubeconfig quickstart-kubeconfig -o wide

NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vsphere-quickstart-d9cr2 NotReady master 7m24s v1.18.6+vmware.1 10.27.51.50 10.27.51.50 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-8zk44 NotReady <none> 4m59s v1.18.6+vmware.1 10.27.51.51 10.27.51.51 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-b55q8 NotReady <none> 4m53s v1.18.6+vmware.1 10.27.51.52 10.27.51.52 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-gn9pn NotReady <none> 4m57s v1.18.6+vmware.1 10.27.51.53 10.27.51.53 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4

While the 4 nodes are present, they are all in a NotReady state as highlighted above. This is because the CNI (Container Network Interface) has not yet been deployed, so Pod to Pod communication across nodes would not yet be possible. Let’s choose Calico, a well known CNI. We can deploy it as follows, referencing our quickstart cluster kubeconfig:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml --kubeconfig quickstart-kubeconfig
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.k8s.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
poddisruptionbudget.policy/calico-kube-controllers created

In a short space of time (less than a minute in my case), the nodes should enter the Ready state.

$ kubectl get nodes --kubeconfig quickstart-kubeconfig -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
vsphere-quickstart-d9cr2 Ready master 11m v1.18.6+vmware.1 10.27.51.50 10.27.51.50 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-8zk44 Ready <none> 8m40s v1.18.6+vmware.1 10.27.51.51 10.27.51.51 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-b55q8 Ready <none> 8m34s v1.18.6+vmware.1 10.27.51.52 10.27.51.52 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4
vsphere-quickstart-md-0-57ff99b55-gn9pn Ready <none> 8m38s v1.18.6+vmware.1 10.27.51.53 10.27.51.53 VMware Photon OS/Linux 4.19.132-1.ph3 containerd://1.3.4

We have just successfully deployed a workload cluster on vSphere using Cluster API.

As mentioned, some scripts which contain the necessary list of environment variables, and which have been used to initialize the kind cluster as a management cluster, and then to deploy a workload cluster, can be found up on this GitHub repo.

Part 2: Cluster API & TKG

In this final part of this post, we will take a look at TKG, the Tanzu Kubernetes Grid cluster, and how it also uses Cluster API. Note that this is TKG standalone, often referred to as TKG multi-cloud or TKGm for short. Do not confuse it with TKGi, the integrated TKG (formerly Enterprise PKS), or the TKG which is provisioned via the TKG service (TKGs) in vSphere with Tanzu. TKG in this case also has the concept of a management cluster, which in turn can be used to create one or more workload clusters.

The first thing to point out is that TKG v1.3.x has a new CLI. The new CLI is called tanzu, whereas in earlier versions of TKG (1.1, 1.2) it was called tkg. I am not going to go through all of the deployment and configuration steps of TKG in this post as the TKG documentation is pretty detailed. Instead, I just want to show how Cluster API is once again utilized. The new tanzu CLI comes with a suite of plugins to enable a K8s administrator to interact with both the management cluster and the workload clusters. The plugins can be displayed as follows:

$ tanzu plugin list
  NAME                LATEST VERSION  DESCRIPTION                                                        REPOSITORY  VERSION  STATUS
  alpha               v1.3.1          Alpha CLI commands                                                 core                 not installed
  cluster             v1.3.1          Kubernetes cluster operations                                      core        v1.3.1   installed
  kubernetes-release  v1.3.1          Kubernetes release operations                                      core        v1.3.1   installed
  login               v1.3.1          Login to the platform                                              core        v1.3.1   installed
  management-cluster  v1.3.1          Kubernetes management cluster operations                           core        v1.3.1   installed
  pinniped-auth       v1.3.1          Pinniped authentication operations (usually not directly invoked)  core        v1.3.1   installed

I have already deployed a TKG management cluster and a workload cluster. To interact with them, I must first use the tanzu login command to login to the TKG management cluster. I have only a single management cluster called tkgm so I select that from the list of servers offered, and then hit the enter key.

$ tanzu login
? Select a server  [Use arrows to move, type to filter]
> tkgm                ()
  + new server

<hit enter>


$ tanzu login
? Select a server tkgm                ()
✔  successfully logged in to management cluster using the kubeconfig tkgm
$

As you can imagine, if there were multiple management clusters, you would just move up and down the list to select the one you wish to manage. Now that the management cluster is selected and logged into, I can query the available clusters. I can include the option –include-management-cluster to see both management and workload clusters in the list. Otherwise, it just lists workload clusters.

$ tanzu cluster list --include-management-cluster
  NAME            NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES       PLAN
  my-vsphere-tkc  default     running  3/3           3/3      v1.20.5+vmware.1  <none>      prod
  tkgm            tkg-system  running  1/1           1/1      v1.20.5+vmware.1  management  dev

$ tanzu cluster list
  NAME            NAMESPACE  STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES   PLAN
  my-vsphere-tkc  default    running  3/3           3/3      v1.20.5+vmware.1  <none>  prod

If I use the get option rather than the list option, I can get additional detail on the cluster. Below is a get from the management cluster tkgm, I begin to see some similarities with the Cluster API provisioned cluster seen earlier in part 1. Note the list of providers shows the Cluster API, the kubeadm bootstrap, kubeadm control plane and vSphere infrastructure providers on the management cluster which are the same providers seen earlier.

$ tanzu management-cluster get
  NAME  NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES
  tkgm  tkg-system  running  1/1           1/1      v1.20.5+vmware.1  management

Details:

NAME                                                     READY  SEVERITY  REASON  SINCE  MESSAGE
/tkgm                                                    True                     26h
├─ClusterInfrastructure - VSphereCluster/tkgm            True                     26h
├─ControlPlane - KubeadmControlPlane/tkgm-control-plane  True                     26h
│ └─Machine/tkgm-control-plane-c8wwv                     True                     26h
└─Workers
  └─MachineDeployment/tkgm-md-0
    └─Machine/tkgm-md-0-76b576db5c-86j2n                 True                     26h

Providers:

  NAMESPACE                          NAME                    TYPE                    PROVIDERNAME  VERSION  WATCHNAMESPACE
  capi-kubeadm-bootstrap-system      bootstrap-kubeadm       BootstrapProvider       kubeadm       v0.3.14
  capi-kubeadm-control-plane-system  control-plane-kubeadm   ControlPlaneProvider    kubeadm       v0.3.14
  capi-system                        cluster-api             CoreProvider            cluster-api   v0.3.14
  capv-system                        infrastructure-vsphere  InfrastructureProvider  vsphere       v0.7.7

The information displayed in the Details section, such as ClusterInfrastructure, MachineDeployment and Machine as all Cluster API constructs which are described in the Cluster API Concepts, but suffice to say that these are the components that enable us to create new Kuberntes workload cluster “objects” from an existing Kubernetes environment. Thus, this TKG management cluster can now be used to create TKG workload clusters via the Cluster API mechanism using the new tanzu CLI. I hope this post has helped to provide some insight into Cluster API, and how it is used by TKG to create workload clusters.

2 Replies to “A closer look at Cluster API and TKG v1.3.1”

  1. Hi Cormac,

    thanks for this very interesting blog post.

    There is one thing, that I still don‘t get:
    What is the difference between TKG 1.3.1 (TKGm) and the Tanzu Kubernetes (Guest-) Clusters that are available via TKGs in vSphere witrh Tanzu?

    Best regards,
    Volker

    1. They are very similar Volker, but there are some subtle differences between them. One would be the CSI driver. On TKC available via TKGs, there is a special CSI driver since the TKC must first communicate to the SV cluster, and then the SV cluster then communicates to vCenter on behalf of the TKC.With TKGm, there is no such proxying of communication and the CSI driver communicates directly to vSphere. So it is simply a few subtle things like that which make them different. I believe there is a long term goal to make them the same, but a few things needs to happen before that becomes a reality.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.