Thus, what I am about to show you is ** unsupported ** today. The reason I am documenting it here is that I know a lot of customers and partners are interested in this process simply for a proof of concept. But please note that this integration should not be done in a production environment. You will not be supported. We are already working on a way to introduce this integration as a simple user experience in a future release. If you wish to implement this procedure, you do so at your own risk.
Without this procedure, any attempt to pull an image from the Harbor Image Registry will fail with the following Pod events:
Normal Pulling <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg \ Pulling image "20.0.0.2/demo-ns/cassandra:v11" Warning Failed <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg \ Failed to pull image "20.0.0.2/demo-ns/cassandra:v11": rpc error: code = Unknown desc = Error response from daemon: \ Get https://20.0.0.2/v2/: x509: certificate signed by unknown authority Warning Failed <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg \ Error: ErrImagePull Normal BackOff <invalid> (x6 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg \ Back-off pulling image "20.0.0.2/demo-ns/cassandra:v11" Warning Failed <invalid> (x7 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg \ Error: ImagePullBackOff
The x509: certificate signed by unknown authority basically means that the requester (TKG cluster worker node) does not have a valid certificate and is not trusted by the registry.
We can break the integration process into 4 steps.
- Retrieve the Harbor Image Registry certificate from the Harbor UI
- Push the certificate to the TKG cluster nodes
- Create a Kubernetes secret which holds the Harbor Image Registry credentials
- Include an ImagePullSecrets parameter in any Pod manifests which pulls an image from the Image Registry
Step 1 – Get Certificate from the Harbor Image Registry
Since Harbor is deployed via vSphere with Kubernetes, it is automatically added to the SSO domain. Simply login to Harbor with your SSO credentials (e.g. administrator@vsphere.local) select the namespace project where the TKG cluster is deployed and then select Repositories. Repositories are where the container images are stored. Here there is a link to download the Registry Certificate. Click on the link and save the certificate.
Step 2 – Push the registry certificate to the TKG cluster nodes
To begin with, you will initially need to be logged in to vSphere with Kubernetes at the namespace layer where the TKG cluster resides. Later we will change contexts and work at the TKG cluster layer.
There are a number of sub-steps to this step. These sub-steps can summarized as follows:
- Fetch the secret to SSH into the TKG nodes
- Fetch the kubeconfig file for the TKG cluster
- Change contexts to the TKG cluster
- Get the IP addresses from the TKG nodes
- Copy the Image registry certificate to each nodes
- Install the Image registry certificate to the node’s trust bundle
- Restart docker on each of the nodes
Let’s now look at those steps in detail.
Step 2a – Fetch the SSH private key secret to SSH onto the TKG nodes
Once logged into the namespace where the TKG cluster is deployed (not logged into the TKG cluster itself), you must fetch the SSH secret for the TKG cluster that will enable login to the TKG nodes. In my example, the namespace is called demo-ns and the TKG cluster is called ch-tkg-cluster-01. The SSH private key has a naming convention of <cluster>-ssh. Thus, in my case, the SSH key secret is called ch-tkg-cluster01-ssh. The command to retrieve the SSH private key is as follows:
$ kubectl get secret -n demo-ns ch-tkg-cluster01-ssh \
-o jsonpath='{.data.ssh-privatekey}' | base64 -d
To make things easier later, store this private key in a file, e.g.
$ kubectl get secret -n demo-ns ch-tkg-cluster01-ssh \
-o jsonpath='{.data.ssh-privatekey}' | base64 -d > cluster-ssh
Step 2b – Fetch the kubeconfig file for the TKG cluster
To allow us to work at the TKG cluster level rather than the namespace level later on, get the kubeconfig for the cluster. Similar to the SSH key previously, the kubeconfig is in a secret called <cluster>-kubeconfig, so in my deployment it is called ch-tkg-cluster01-kubeconfig. The command to retrieve the kubeconfig is as follows:
$ kubectl get secret -n demo-ns ch-tkg-cluster01-kubeconfig \
-o jsonpath='{.data.value}' | base64 -d > cluster-kubeconfig
Step 2c – Switch to the TKG cluster
With the kubeconfig retrieved in the previous step, we can now switch from the namespace context to the TKG guest cluster context.
$ export KUBECONFIG=cluster-kubeconfig
You can verify that the context has changed by running a kubectl get nodes. We should now see the control plane and workers VMs of the TKG cluster.
$ kubectl get nodes NAME STATUS ROLES AGE VERSION ch-tkg-cluster01-control-plane-gc8b2 Ready master 6d19h v1.16.8+vmware.1 ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Ready <none> 6d19h v1.16.8+vmware.1 ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Ready <none> 6d19h v1.16.8+vmware.1 ch-tkg-cluster01-workers-2krnb-65fdb7455b-s2whg Ready <none> 6d19h v1.16.8+vmware.1
Step 2d – Get the IP address of the TKG nodes
I used the following script to pick up the IP address of each of the TKG cluster nodes, and store them to a file called ip-list. There are multiple ways of achieving this – this is just one way.
$ for i in `kubectl get nodes --no-headers | awk '{print $1}'` do kubectl get node $i -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' >> ip-list echo >> ip-list done $ cat ip-list 10.244.0.242 10.244.0.244 10.244.0.245 10.244.0.243
Step 2e – Copy the Image registry certificate to each nodes
In this step, we need to copy the registry certificate over to each of the TKG nodes. We have the SSH private key in a file called cluster-ssh. I have also stored the registry certificate (downloaded in step 1) in a file called ca.crt in my current working directory. Thus, I can use the following command to copy the cert to each of the TKG nodes:
$ scp -i cluster-ssh ca.crt vmware-system-user@10.244.0.242:/home/vmware-system-user/registry_ca.crt
I could do this manually for each node, or I could wrap it in a script as follows (since I have the list of node IP addresses stored in a file called ip-list from the previous step):
$ for i in `cat ip-list` do scp -i cluster-ssh ca.crt vmware-system-user@${i}:/home/vmware-system-user/registry_ca.crt done
Now that we have copied the registry certificate to each TKG node, as a last step we must add it to the trust bundle on each node, and then restart the docker service.
Step 2f – Add the registry certificate to the node’s trust bundle
The registry certificate is now on the TKG node, but it is not in the correct location. We can use the following command to place it in the correct location.
$ ssh -i cluster-ssh vmware-system-user@10.244.0.242 'sudo bash -c "cat /home/vmware-system-user/registry_ca.crt >> /etc/pki/tls/certs/ca-bundle.crt"' The authenticity of host '10.244.0.242 (10.244.0.242)' can't be established. ECDSA key fingerprint is SHA256:uMWEr+Fh+6bwBRImd1jfefTnMU7UvGSGOCZygbaBbtg. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '10.244.0.242' (ECDSA) to the list of known hosts. Welcome to Photon 3.0 (\m) - Kernel \r (\l)
Again, rather than do this manually for every node, you could wrap it in the following script.
$ for i in `cat ip-list` do ssh -i cluster-ssh vmware-system-user@${i} \ 'sudo bash -c "cat /home/vmware-system-user/registry_ca.crt >> /etc/pki/tls/certs/ca-bundle.crt"' done
Step 2g – Restart the docker service on each node
The final part of this step is to restart docker. This can be done as follows:
$ ssh -i cluster-ssh vmware-system-user@10.244.0.242 'sudo systemctl restart docker.service' Welcome to Photon 3.0 (\m) - Kernel \r (\l)
And as before, we can wrap this in a script for all nodes:
$ for i in `cat ip-list` do ssh -i cluster-ssh vmware-system-user@${i} 'sudo systemctl restart docker.service' done
Combining sub-steps 2e, 2f and 2g
Now, I have simplified things greatly by creating 3 sub-steps for 2e, 2f and 2g. You could have placed all of those into a single script if you wish, but I separated them out to make the steps easier to follow. If you wish to combine the 3 sub-steps, you could do something similar to the following:
$ for i in `cat ip-list` do scp -i cluster-ssh ca.crt vmware-system-user@${i}:/home/vmware-system-user/registry_ca.crt ssh -i cluster-ssh vmware-system-user@${i} \ 'sudo bash -c "cat /home/vmware-system-user/registry_ca.crt >> /etc/pki/tls/certs/ca-bundle.crt"' ssh -i cluster-ssh vmware-system-user@${i} 'sudo systemctl restart docker.service' done
At this point, you might think that you have done enough to allow the TKG nodes to use the Harbor Image Registry. Unfortunately not. If you attempt to deploy an application where the Pod attempts to pull an image from the Harbor Image Registry, the Pod events no longer display the X509 error seen previously, but instead display the following failure:
Normal BackOff <invalid> (x6 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Back-off pulling image "20.0.0.2/demo-ns/cassandra:v11" Warning Failed <invalid> (x6 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Error: ImagePullBackOff Normal Pulling <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Pulling image "20.0.0.2/demo-ns/cassandra:v11" Warning Failed <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Failed to pull image "20.0.0.2/demo-ns/cassandra:v11": rpc error: code = Unknown desc = Error response from daemon: pull access denied for 20.0.0.2/demo-ns/cassandra, repository does not exist or may require 'docker login' Warning Failed <invalid> (x4 over <invalid>) kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-9rdkb Error: ErrImagePull
The clue is in the error “may require ‘docker login’“. We need to provide the Pods with Image Registry credentials so that they are able to do a docker login to retrieve the image. Let’s do that next.
Step 3 – Create a secret with Image Registry credentials
This step is described in detail in the Kubernetes documentation here. To begin, we need credentials from a valid docker login to the Harbor Image registry from a desktop/laptop. This creates a .docker/config.json file which holds credentials which can then be used to create a secret that your TKG Pods can use to access the image registry.
Here is my ~/.docker/config.json:
$ cat ~/.docker/config.json { "auths": { "20.0.0.2": { "auth": "YWRtaW5pc3RyYXRvckB2c3BoZXJlLmxvY2FsOlZNd2FyZTEyMyE=" } } }
20.0.0.2 is the IP address of my Harbor Image Registry. Yours may be different. The next step is to create a secret from this file:
$ kubectl create secret generic regcred \ > --from-file=.dockerconfigjson=/home/cormac/.docker/config.json \ > --type=kubernetes.io/dockerconfigjson secret/regcred created
Query that the secret was successfully created:
$ kubectl get secret regcred --output=yaml apiVersion: v1 data: .dockerconfigjson: ewoJImF1dGhzIjogewoJCSIyMC4wLjAuMiI6IHsKCQkJImF1dGgiOiAiWVdSdGFXNXBjM1J5WVhSdmNrQjJjM0JvWlhKbExteHZZMkZzT2xaTmQyRnlaVEV5TXlFPSIKCQl9Cgl9Cn0K kind: Secret metadata: creationTimestamp: "2020-06-23T07:37:03Z" name: regcred namespace: default resourceVersion: "1560917" selfLink: /api/v1/namespaces/default/secrets/regcred uid: f189d33f-ba20-41a5-9b33-6fdaf3618a5b type: kubernetes.io/dockerconfigjson
Looks good. The last step is to modify our Pod manifests to include the secret, and of course to pull the container image from the Harbor Image Registry. I’m not going to show you how to push, tag and pull images to/from the registry – there are plenty of examples of that out there, including this blog.
Step 4 – Add secret to Pod manifest
A new entry is required in the Pod manifest so that when it pulls a manifest from an internal image registry, it also has a secret to allow it to login to the manifest. The entry is called spec.imagePullSecrets.name. Here is a sample manifest for a simple busybox which pulls its container image from my Harbor image repository and which also includes the secret.
$ cat busybox-cor.yaml apiVersion: v1 kind: Pod metadata: name: ch-busybox labels: app: ch-busybox spec: containers: - image: "20.0.0.2/demo-ns/busybox" command: - sleep - "3600" imagePullPolicy: Always name: busybox imagePullSecrets: - name: regcred restartPolicy: Always
And the final step – does it work? Can we now have a Pod on a TKG guest cluster pull a container image from the embedded Harbor Image Registry on vSphere with Kubernetes?
$ kubectl apply -f busybox-cor.yaml pod/ch-busybox created $ kubectl get pod NAME READY STATUS RESTARTS AGE ch-busybox 1/1 Running 0 5s $ kubectl describe pod ch-busybox Name: ch-busybox Namespace: default Priority: 0 PriorityClassName: <none> Node: ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd/10.244.0.244 Start Time: Tue, 23 Jun 2020 08:41:39 +0100 Labels: app=ch-busybox Annotations: cni.projectcalico.org/podIP: 192.168.65.131/32 cni.projectcalico.org/podIPs: 192.168.65.131/32 kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"ch-busybox"},"name":"ch-busybox","namespace":"default"},"spe... kubernetes.io/psp: vmware-system-privileged Status: Running IP: 192.168.65.131 Containers: busybox: Container ID: docker://64155ea248f3d33c4afb7313e7cbd2819c00125c859cd8f435c4dc93094d67f5 Image: 20.0.0.2/demo-ns/busybox Image ID: docker-pullable://20.0.0.2/demo-ns/busybox@sha256:d2af0ba9eb4c9ec7b138f3989d9bb0c9651c92831465eae281430e2b254afe0d Port: <none> Host Port: <none> Command: sleep 3600 State: Running Started: Tue, 23 Jun 2020 08:41:41 +0100 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-zv58f (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: default-token-zv58f: Type: Secret (a volume populated by a Secret) SecretName: default-token-zv58f Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled <unknown> default-scheduler Successfully assigned default/ch-busybox to ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Normal Pulling <invalid> kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Pulling image "20.0.0.2/demo-ns/busybox" Normal Pulled <invalid> kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Successfully pulled image "20.0.0.2/demo-ns/busybox" Normal Created <invalid> kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Created container busybox Normal Started <invalid> kubelet, ch-tkg-cluster01-workers-2krnb-65fdb7455b-7v8wd Started container busybox
Success! We have successfully pulled an image for our Pod running on a TKG (guest) cluster from the Harbor Image Registry integrated in vSphere with Kubernetes.
Now, at the risk of repeating myself, this is not supported. There are a number of life-cycle management activities that we need to work through before we can support it, and I will make the assumption that we will also make the integration easier than what I have shown you here. However, if you keep this in mind, and are only interested in doing some testing or a proof of concept with TKG clusters in vSphere with Kubernetes, then this procedure should help.
Finally, a word of thanks to Ross Kukulinski who gave me a bunch of pointers when I got stuck (which happened quite a lot during this exercise).