Deploying TKG v1.2.0 (TKGm) in an internet-restricted environment using Harbor
In this post, I am going to outline the steps involved to successfully deploy a Tanzu Kubernetes Grid (TKG) management cluster and workload clusters in an internet restricted environment. [Note: since first writing this article, we appear to have standardized on TGKm – TKG multi-cloud – for this product. This is often referred to as an air-gapped environment. Note that for part of this exercise, a virtual machine will need to be connected to the internet in order to pull down the images requires for TKG. Once these have been downloaded and pushed up to our local Harbor container image registry, the internet connection can be removed and we will work in a completely air-gapped environment.
Note that TKG here refers to the TKG distribution that deploys a management cluster, and provides a tkg CLI to deploy TKG “workload” clusters. As mentioned, this is now been marketed as TKGm. This is different to the TKG found in vSphere with Tanzu (and VCF with Tanzu). vSphere with Tanzu provides many unique advanced features such as vSphere Namespaces, vSphere SSO integration, TKGs – TKG service – (for deploying TKG “guest” clusters), vSphere Network Service, vSphere Storage Service and, if you have NSX-T, the vSphere Pods Service and vSphere Registry Service. However, for the purposes on this post, we are working with the former.
Prerequisites
I am using Ubuntu 18.04 as the Guest OS to deploy Harbor as well as run my tkg CLI commands. You could use another distro, but some commands shown here are specific to Ubuntu. I already have both docker (v19.03.13) and docker-compose (v1.27.4) installed. I’m not going to cover the installation of these – there are plenty of examples already available. One thing that might be useful is to avoid putting sudo in front of every docker command. Information on how to add your user as a trusted docker user can be found here.
We also need to have the TKG binaries downloaded. Note that this exercise uses TKG v1.2.0. This allows us to used a self-signed CA certificate which is useful in home-labs and non-production environments, but certainly not advisable in production environments. The binaries are available here: https://www.vmware.com/go/get-tkg. You will also need to have the appropriate OVA template available on your vSphere environment for the TKG images. This image will be used to build the control plane and worker nodes.
TKG requires DHCP to be available on your internet restricted environment. I used dnsmasq to provide both DNS and DHCP to my internet restricted environment, which is again very useful in non-production, home-lab type environments. I used guidance from both linuxhints and computing for geeks to configured dnsmasq. Ensure these are working correctly by using the appropriate tools (e.g. nslookup, dig) before proceeding with the TKG deployment.
Harbor, the VMware Container Image Registry, should already be installed and running. I have posted a blog on how to do this earlier. It is extremely important that all the appropriate Docker and Harbor certificates and keys are in place. This is covered in the Harbor post. For this new post, I’ve redeployed Harbor once again, and here is the status of it, along with a simple test to show that we can login using both http & https as well as push local images to it.
$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b1da486c7e16 goharbor/nginx-photon:v2.1.0 "nginx -g 'daemon of…" About a minute ago Up About a minute (healthy) 0.0.0.0:80->8080/tcp, 0.0.0.0:443->8443/tcp nginx ff6d0b00ed66 goharbor/harbor-jobservice:v2.1.0 "/harbor/entrypoint.…" About a minute ago Up About a minute (healthy) harbor-jobservice 1d2522c10110 goharbor/harbor-core:v2.1.0 "/harbor/entrypoint.…" About a minute ago Up About a minute (healthy) harbor-core 32e2e1246de3 goharbor/harbor-db:v2.1.0 "/docker-entrypoint.…" About a minute ago Up About a minute (healthy) harbor-db 11960021374d goharbor/redis-photon:v2.1.0 "redis-server /etc/r…" About a minute ago Up About a minute (healthy) redis b14fd57ed92b goharbor/harbor-portal:v2.1.0 "nginx -g 'daemon of…" About a minute ago Up About a minute (healthy) harbor-portal 89f947fe7077 goharbor/harbor-registryctl:v2.1.0 "/home/harbor/start.…" About a minute ago Up About a minute (healthy) registryctl 8d44c29fc175 goharbor/registry-photon:v2.1.0 "/home/harbor/entryp…" About a minute ago Up About a minute (healthy) registry 873df21d0338 goharbor/harbor-log:v2.1.0 "/bin/sh -c /usr/loc…" About a minute ago Up About a minute (healthy) 127.0.0.1:1514->10514/tcp harbor-log $ sudo docker login cormac-mgmt.corinternal.com Username: admin Password: ****** WARNING! Your password will be stored unencrypted in /home/cormac/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded $ sudo docker login https://cormac-mgmt.corinternal.com Authenticating with existing credentials... WARNING! Your password will be stored unencrypted in /home/cormac/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded $ sudo docker run hello-world Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 0e03bdcc26d7: Pull complete Digest: sha256:e7c70bb24b462baa86c102610182e3efcb12a04854e8c582838d92970a09f323 Status: Downloaded newer image for hello-world:latest Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/ $ sudo docker tag hello-world:latest cormac-mgmt.corinternal.com/library/hello-world:latest $ sudo docker push cormac-mgmt.corinternal.com/library/hello-world The push refers to repository [cormac-mgmt.corinternal.com/library/hello-world] 9c27e219663c: Pushed latest: digest: sha256:90659bf80b44ce6be8234e6ff90a1ac34acbeb826903b02cfa0da11c82cbc042 size: 525
$ sudo snap install kubectl --classic 2020-11-30T11:00:40Z INFO Waiting for automatic snapd restart... kubectl 1.19.4 from Canonical✓ installed
Finally, you will need the tools yq and jq installed. These are required by a script (seen later) which parses the TKG BOMs for the required container images. This in turn creates a new script which pulls the original images from the TKG registry (registry.tkg.vmware.run) and pushes them up to our local Harbor image registry so that TKG can access them. We will see how this is done shortly.
You will need to pull down yq as follows – I used version 3.4.1.
$ sudo wget https://github.com/mikefarah/yq/releases/download/3.4.1/yq_linux_amd64 -O /usr/bin/yq $ sudo chmod +x /usr/bin/yq
On Ubuntu, jq can be installed as follows:
$ sudo apt-get install jq
$ chmod +x /usr/bin/jq
All of prerequisites are now in place – we can start will the pulling of the images from the public TKG registry and pushing them up to our private Harbor registry.
Pushing TKG images to local Harbor registry
This procedure requires the script found in the official Tanzu documentation here. The gen-publish-images.sh script traverses all of the BOM manifest files found in your local .tkg/bom folder and creates a new script from the output. This resulting script then pulls the images from the public TKG registry and pushes them to your local Harbor registry. As I mentioned earlier, I am not going to inject my own CA cert into my TKG nodes. Instead, I’m going to bypass this check with the use of an environment variable that relaxes certificate verification. For production environments, you would not want to take this approach, and instead use a secure method.
Before running the script, set the following variables:
TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY=true TKG_CUSTOM_IMAGE_REPOSITORY=cormac-tkgm.corinternal.com/library
To ensure that these variables stay set, I added them to my $HOME/.bash_profile and then sourced the .bash_profile. Alternatively, log out and log back in again to ensure the variables are set correctly.
$ env | grep TKG TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY=true TKG_CUSTOM_IMAGE_REPOSITORY=cormac-tkgm.corinternal.com/library
To create the .tgk folder and sub-folders in you r$HOME, the following command should be run.
$ tkg get management-cluster
Finally, before we run the gen-publish-images.sh script, we can reduce the amount of time taken to pull and push the TKG images by removing a number of the bom manifests. You should do this only if you are certain you are never going to use some of the older Kubernetes versions available. In my case, I created a sub-folder in .tkg called oldbom, and moved a number of the older manifests out, so that I only have a reduced number of manifests from 9 to 3 which significantly reduces the number ofimages that require pulling and pushing.
$ cd .tkg/bom $ ls bom-1.17.11+vmware.1.yaml bom-1.18.8+vmware.1.yaml bom-1.2.0+vmware.1.yaml $ ls ../oldbom/ bom-1.1.0+vmware.1.yaml bom-1.1.2+vmware.1.yaml bom-1.1.3+vmware.1.yaml \ bom-1.17.6+vmware.1.yaml bom-1.17.9+vmware.1.yaml bom-tkg-1.0.0.yaml
We can now run the gen-publish-images.sh script to get a list of all of the images, and configure the pull and push commands. The output is redirected to another script, which I have called publish-images.sh. We then run the latter script to do the actual pulling and pushing.
$ ./gen-publish-images.sh > publish-images.sh $ chmod +x publish-images.sh $ more publish-images.sh docker pull registry.tkg.vmware.run/prometheus/alertmanager:v0.20.0_vmware.1 docker tag registry.tkg.vmware.run/prometheus/alertmanager:v0.20.0_vmware.1 cormac-tkgm.corinternal.com/library/prometheus/alertmanager:v0.20.0_vmware.1 docker push cormac-tkgm.corinternal.com/library/prometheus/alertmanager:v0.20.0_vmware.1 docker pull registry.tkg.vmware.run/antrea/antrea-debian:v0.9.3_vmware.1 docker tag registry.tkg.vmware.run/antrea/antrea-debian:v0.9.3_vmware.1 cormac-tkgm.corinternal.com/library/antrea/antrea-debian:v0.9.3_vmware.1 docker push cormac-tkgm.corinternal.com/library/antrea/antrea-debian:v0.9.3_vmware.1 . --<snip> . $ ./publish-images.sh registry.tkg.vmware.run/velero/velero-plugin-for-vsphere:v1.0.2_vmware.1 The push refers to repository [cormac-tkgm.corinternal.com/library/velero/velero-plugin-for-vsphere] fc075a5f6276: Layer already exists 9cd4316ae370: Layer already exists 5fec6d6d7c8c: Layer already exists e7932ff84389: Layer already exists 40af5eccbc98: Layer already exists 895dac616c95: Layer already exists e53100abc225: Layer already exists 767a7b7a8ec5: Layer already exists v1.0.2_vmware.1: digest: sha256:68a0334bf06747b87650c618c713bff7c28836183cbafa13bb81c18c250a272a size: 1991 v1.4.2_vmware.1: Pulling from velero/velero-restic-restore-helper Digest: sha256:8e0756ecfc07e0e4812daec3dce44b6ccef5fc64aa0f438e42a6592b2cf2a634 Status: Image is up to date for registry.tkg.vmware.run/velero/velero-restic-restore-helper:v1.4.2_vmware.1 . --<snip> .
When the script completes processing, our internal Harbor image registry should now contain all the necessary images to do a deployment of TKG. The connection to the external internet can now be removed from this virtual machines, and you should be able to run the rest of these commands in your air-gapped, internet restricted environment.
Initialize the TKG config.yaml
In the .tkg folder, there exists a config.yaml. This needs to be populated with a bunch of additional information about the vSphere environment. The simplest way to do this is to launch the TKG manager UI and populate the fields accordingly. The UI is launched with the following command:
$ tkg init --ui Logs of the command execution can also be found at: /tmp/tkg-20201130T110113689528616.log Validating the pre-requisites... Serving kickstart UI at http://127.0.0.1:8080
You can now launch a browser and point it to the URL for the kickstart UI above. I won’t show how to populate all of the fields – it is pretty self-explanatory. Note that you will need to have the required OVA deployed on your vSphere environment and converted to a template for the UI to pick it up. Full details are in the the official TKG documentation. Here are some details taken from my environment.
Notice at the bottom of this page the CLI command is provided. We will not deploy the configuration from the UI, but instead take the tkg CLI command and run it manually.
The reason for not deploying this via the UI is to make an additional change to the .tkg/config.yaml. This may or may not be necessary, but in my testing I also added the environment variables seen earlier to the manifest.
$ head -4 .tkg/config.yaml TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY: true TKG_CUSTOM_IMAGE_REPOSITORY: cormac-tkgm.corinternal.com/library cert-manager-timeout: 30m0s overridesFolder: /home/cormac/.tkg/overrides
We can now go ahead with the deployment of the TKG management cluster.
Deploy the TKG management cluster
All our images for the TKG management cluster should now be pulled from the local Harbor image registry. There should be no attempt made to pull images from the public TKG registry so long as the environment variables are pointing at the correct locations. By having the skip TLS verify environment variable setting, we should also avoid any X509 certificate errors, which would otherwise be observed by the KinD kickstart node, and the TKG nodes when trying to access container images from the Harbor registry.
Note that this is TKG v1.2.0 Thus there are some new options to the command line, including the ability to specify a load balancer IP address (no need to deploy a HA Proxy OVA in this version) as well as the ability to pick a CNI, in this case Antrea. For more on Antrea, check out this recent blog post. Note that you are also offered the option of using vSphere with Tanzu or VCF with Tanzu. However, in this case I am going to proceed with TKG and deploy a non-integrated TKG management cluster on vSphere.
$ tkg init -i vsphere --vsphere-controlplane-endpoint-ip 10.35.13.244 -p prod --cni antrea Logs of the command execution can also be found at: /tmp/tkg-20201130T140502473586632.log Validating the pre-requisites... vSphere 7.0 Environment Detected. You have connected to a vSphere 7.0 environment which does not have vSphere with Tanzu enabled. vSphere with Tanzu includes an integrated Tanzu Kubernetes Grid Service which turns a vSphere cluster into a platform for running Kubernetes workloads in dedicated resource pools. Configuring Tanzu Kubernetes Grid Service is done through vSphere HTML5 client. Tanzu Kubernetes Grid Service is the preferred way to consume Tanzu Kubernetes Grid in vSphere 7.0 environments. Alternatively you may deploy a non-integrated Tanzu Kubernetes Grid instance on vSphere 7.0. Do you want to configure vSphere with Tanzu? [y/N]: N Would you like to deploy a non-integrated Tanzu Kubernetes Grid management cluster on vSphere 7.0? [y/N]: y Deploying TKG management cluster on vSphere 7.0 ... Setting up management cluster... Validating configuration... Using infrastructure provider vsphere:v0.7.1 Generating cluster configuration... Setting up bootstrapper... Bootstrapper created. Kubeconfig: /home/cormac/.kube-tkg/tmp/config_GsUp1GFQ Installing providers on bootstrapper... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.10" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.1" TargetNamespace="capv-system" Start creating management cluster... Saving management cluster kuebconfig into /home/cormac/.kube/config Installing providers on management cluster... Fetching providers Installing cert-manager Version="v0.16.1" Waiting for cert-manager to be available... Installing Provider="cluster-api" Version="v0.3.10" TargetNamespace="capi-system" Installing Provider="bootstrap-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-bootstrap-system" Installing Provider="control-plane-kubeadm" Version="v0.3.10" TargetNamespace="capi-kubeadm-control-plane-system" Installing Provider="infrastructure-vsphere" Version="v0.7.1" TargetNamespace="capv-system" Waiting for the management cluster to get ready for move... Waiting for addons installation... Moving all Cluster API objects from bootstrap cluster to management cluster... Performing move... Discovering Cluster API objects Moving Cluster API objects Clusters=1 Creating objects in the target cluster Deleting objects from the source cluster Context set for management cluster tkg-mgmt-vsphere-20201130140507 as 'tkg-mgmt-vsphere-20201130140507-admin@tkg-mgmt-vsphere-20201130140507'. Management cluster created! You can now create your first workload cluster by running the following: tkg create cluster [name] --kubernetes-version=[version] --plan=[plan] $ $ tkg get management-cluster MANAGEMENT-CLUSTER-NAME CONTEXT-NAME STATUS tkg-mgmt-vsphere-20201130140507 * tkg-mgmt-vsphere-20201130140507-admin@tkg-mgmt-vsphere-20201130140507 Success
Excellent! We see the KinD bootstrap cluster deployed initially as a container in docker, then we see that used to create the TKG management cluster as a set of VMs. Once everything is up and running, context switches from the KinD cluster to the TKG management cluster, and the KinD cluster is removed. All of this has been achieved using images in our Harbor image registry – there was no need to pull an images from external repositories. For more details about what is happening during this initialization process, check out this earlier post that I wrote on KinD and TKG.
Deploy a TKG workload cluster
$ tkg create cluster my-cluster --plan=prod --controlplane-machine-count=3 --worker-machine-count=5 \ --vsphere-controlplane-endpoint-ip 10.35.13.246 Logs of the command execution can also be found at: /tmp/tkg-20201130T142739702360132.log Validating configuration... Creating workload cluster 'my-cluster'... Waiting for cluster to be initialized... Waiting for cluster nodes to be available... Waiting for addons installation... Workload cluster 'my-cluster' created $ tkg get cluster --include-management-cluster NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROLES my-cluster default running 3/3 5/5 v1.19.1+vmware.2 <none> tkg-mgmt-vsphere-20201130140507 tkg-system running 3/3 1/1 v1.19.1+vmware.2 management
Both the TKG management cluster VMs and workload cluster VMs are all visible in the vSphere client.
TKG is running successfully in an air-gapped, internet restricted environment.
Troubleshooting
I spent quite a bit of time in getting this functionality to work. I thought it might be useful to share some of that experience with you.
Gotchas
Let’s start with the gotchas. The ability to use your own self-signed certificates with the environment variable TKG_CUSTOM_IMAGE_REPOSITORY_SKIP_TLS_VERIFY is only available in TKG v1.2.0. I spent a lot of time with TKG v1.1.3 and constantly hit X509 certificate issues when the Kind node was trying to pull images (namely the certificate containers) from the Harbor registry. It was only after moving to v1.2.0 that I was able to get this to work successfully with the environment detail. Again, this is ok for non-production setups, but for production, you should really use a signed CA, or if you do want to use your own trusted root CA, you need to inject these into the TKG nodes on every cluster.
tkg init verbosity
One of the reasons why I ran tkg init at the command line rather than via the UI is that you can add verbosity to the output. If you run the command with a -v 5 or a -v 9, you get a lot more information about the steps that are currently taking place during the deployment, which can be very useful for troubleshooting.
kubectl on Kind components
If you are interested in querying the state of the bootstrap Kind cluster, the Kubernetes configuration file can be found in.kube-tkg/tmp. You can now use the kubectl command downloaded earlier to query the status of various objects. Here is an example of displaying the Pods running in the Kind cluster. This is extremely useful for checking if the Pods are able to pull their required images successfully from the Harbor repository.
$ kubectl get pods -A --kubeconfig .kube-tkg/tmp/config_2sivlONl NAMESPACE NAME READY STATUS RESTARTS AGE capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-748cff6cd9-rsxkh 2/2 Running 0 14s capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-5fb647c458-zdkd2 2/2 Running 0 12s capi-system capi-controller-manager-686c54469c-9rp86 2/2 Running 0 15s capi-webhook-system capi-controller-manager-5d66994b4b-zpqqb 1/2 Running 0 16s capi-webhook-system capi-kubeadm-bootstrap-controller-manager-86b5cbdc78-9xms4 2/2 Running 0 15s capi-webhook-system capi-kubeadm-control-plane-controller-manager-d9c45cbc-qr5zb 2/2 Running 0 13s capi-webhook-system capv-controller-manager-77c9948bb7-cpm48 1/2 Running 0 11s capv-system capv-controller-manager-6bcd99dfd-ddxmr 1/2 Running 0 10s cert-manager cert-manager-6bd4f58b67-tx98d 1/1 Running 0 34s cert-manager cert-manager-cainjector-85dd796c84-lwqsj 1/1 Running 0 34s cert-manager cert-manager-webhook-5fffc4d84c-x4kbj 1/1 Running 0 34s kube-system coredns-5bcf65484d-9cpp8 1/1 Running 0 46s kube-system coredns-5bcf65484d-rlcxx 1/1 Running 0 46s kube-system etcd-tkg-kind-bv2f4a0cnnikkm7aehh0-control-plane 0/1 Running 0 56s kube-system kindnet-j29wn 1/1 Running 0 46s kube-system kube-apiserver-tkg-kind-bv2f4a0cnnikkm7aehh0-control-plane 1/1 Running 0 56s kube-system kube-controller-manager-tkg-kind-bv2f4a0cnnikkm7aehh0-control-plane 0/1 Running 0 56s kube-system kube-proxy-cg958 1/1 Running 0 46s kube-system kube-scheduler-tkg-kind-bv2f4a0cnnikkm7aehh0-control-plane 0/1 Running 0 56s local-path-storage local-path-provisioner-8b46957d4-rgcjv 1/1 Running 0 46s
If there are issues with a particular Pod, you can describe the Pod as follows to see any related events. This is from a deployment where the environment variable to skip certificate verification was not set, so I experienced X509 certificate issues, as you can see in the events below.
$ kubectl describe pod cert-manager-6bd4f58b67-zst7h -n cert-manager --kubeconfig .kube-tkg/tmp/config_JrrfmfML Name: cert-manager-6bd4f58b67-zst7h Namespace: cert-manager Priority: 0 Node: tkg-kind-bv2el4gcnnig5b03esv0-control-plane/172.17.0.3 Start Time: Mon, 30 Nov 2020 12:51:19 +0000 Labels: app=cert-manager app.kubernetes.io/component=controller app.kubernetes.io/instance=cert-manager app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=cert-manager helm.sh/chart=cert-manager-v0.16.1 pod-template-hash=6bd4f58b67 Annotations: prometheus.io/path: /metrics prometheus.io/port: 9402 prometheus.io/scrape: true Status: Pending IP: 10.244.0.3 IPs: IP: 10.244.0.3 Controlled By: ReplicaSet/cert-manager-6bd4f58b67 . .--<snip> . Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 75s default-scheduler Successfully assigned cert-manager/cert-manager-6bd4f58b67-zst7h to tkg-kind-bv2el4gcnnig5b03esv0-control-plane Normal Pulling 39s (x3 over 74s) kubelet Pulling image "cormac-tkgm.corinternal.com/library/cert-manager/cert-manager-controller:v0.16.1_vmware.1" Warning Failed 39s (x3 over 74s) kubelet Failed to pull image "cormac-tkgm.corinternal.com/library/cert-manager/cert-manager-controller:v0.16.1_vmware.1": \ rpc error: code = Unknown desc = failed to pull and unpack image "cormac-tkgm.corinternal.com/library/cert-manager/cert-manager-controller:v0.16.1_vmware.1": \ failed to resolve reference "cormac-tkgm.corinternal.com/library/cert-manager/cert-manager-controller:v0.16.1_vmware.1": failed to do request: \ Head https://cormac-tkgm.corinternal.com/v2/library/cert-manager/cert-manager-controller/manifests/v0.16.1_vmware.1: x509: certificate signed by unknown authority Warning Failed 39s (x3 over 74s) kubelet Error: ErrImagePull Normal BackOff 11s (x4 over 73s) kubelet Back-off pulling image "cormac-tkgm.corinternal.com/library/cert-manager/cert-manager-controller:v0.16.1_vmware.1" Warning Failed 11s (x4 over 73s) kubelet Error: ImagePullBackOff
Logging onto the Kind nodes
$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cf853a6447e0 cormac-tkgm.corinternal.com/library/kind/node:v1.19.1_vmware.2 "/usr/local/bin/entr…" 4 minutes ago Up 4 minutes 127.0.0.1:34695->6443/tcp tkg-kind-bv2hfq0cnnimqrerr300-control-plane b1e3c3034c1c goharbor/nginx-photon:v2.1.0 "nginx -g 'daemon of…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:80->8080/tcp, 0.0.0.0:443->8443/tcp nginx 6c044e4844a6 goharbor/harbor-jobservice:v2.1.0 "/harbor/entrypoint.…" 4 hours ago Up 4 hours (healthy) harbor-jobservice 047889a4e74e goharbor/harbor-core:v2.1.0 "/harbor/entrypoint.…" 4 hours ago Up 4 hours (healthy) harbor-core 1f14a1e6aecc goharbor/harbor-registryctl:v2.1.0 "/home/harbor/start.…" 4 hours ago Up 4 hours (healthy) registryctl bffe8ac6bf75 goharbor/registry-photon:v2.1.0 "/home/harbor/entryp…" 4 hours ago Up 4 hours (healthy) registry 6afad7a5504c goharbor/harbor-db:v2.1.0 "/docker-entrypoint.…" 4 hours ago Up 4 hours (healthy) harbor-db 5ab3f2045a34 goharbor/harbor-portal:v2.1.0 "nginx -g 'daemon of…" 4 hours ago Up 4 hours (healthy) harbor-portal 02b2384abb73 goharbor/redis-photon:v2.1.0 "redis-server /etc/r…" 4 hours ago Up 4 hours (healthy) redis f37c1d9ac1ad goharbor/harbor-log:v2.1.0 "/bin/sh -c /usr/loc…" 4 hours ago Up 4 hours (healthy) 127.0.0.1:1514->10514/tcp harbor-log $ docker exec -it cf853a6447e0 bash root@tkg-kind-bv2hfq0cnnimqrerr300-control-plane:/# ctr NAME: ctr - __ _____/ /______ / ___/ __/ ___/ / /__/ /_/ / \___/\__/_/ containerd CLI USAGE: ctr [global options] command [command options] [arguments...] VERSION: v1.3.3-14-g449e9269 DESCRIPTION ctr is an unsupported debug and administrative client for interacting with the containerd daemon. Because it is unsupported, the commands, options, and operations are not guaranteed to be backward compatible or stable from release to release of the containerd project.
Kudos
I’d like to close with a big thank you to my colleagues, Tom Schwaller and Keith Lee, for their guidance on this, especially on the certificate gotchas highlighted above. I’d also like to thank Chip Zoller for providing more details on the secure certificate methods. Thanks guys!