LDAP integration with Pinniped and Dex
In this post, I will deploy a TKG v1.4 management cluster on vSphere. This environment uses the NSX ALB to provide IP addresses for both the TKG cluster control plane as well as Load Balancer services for applications. After deploying the TKG management cluster, the Pinniped and Dex services are converted from NodePort to Load Balancer. Then we will authenticate an LDAP user and assign a ClusterRoleBinding to that user so that they can work with the non-admin context of the management cluster. I will not cover how to deploy TKG with LDAP integration as this is covered in the previous post and the steps are identical.
Let’s begin by looking at the Pinniped and Dex services after initial management cluster deployment. As mentioned, both are configured to use NodePort services. We will need to change this to Load Balancer.
$ kubectl get all -n pinniped-supervisor NAME READY STATUS RESTARTS AGE pod/pinniped-post-deploy-job-789cj 0/1 Error 0 14h pod/pinniped-post-deploy-job-c7jfc 0/1 Completed 0 14h pod/pinniped-supervisor-ff8467c76-qt9kz 1/1 Running 0 14h pod/pinniped-supervisor-ff8467c76-vjh44 1/1 Running 0 14h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/pinniped-supervisor NodePort 100.65.148.115 <none> 443:31234/TCP 14h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/pinniped-supervisor 2/2 2 2 14h NAME DESIRED CURRENT READY AGE replicaset.apps/pinniped-supervisor-ff8467c76 2 2 2 14h NAME COMPLETIONS DURATION AGE job.batch/pinniped-post-deploy-job 1/1 3m53s 14h $ kubectl get all -n tanzu-system-auth NAME READY STATUS RESTARTS AGE pod/dex-657fdcb9f9-bhtxf 1/1 Running 0 14h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/dexsvc NodePort 100.67.123.98 <none> 5556:30167/TCP 14h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/dex 1/1 1 1 14h NAME DESIRED CURRENT READY AGE replicaset.apps/dex-657fdcb9f9 1 1 1 14h
Configure the Pinniped and Dex services to use Load Balancer services as per the documentation found here. Both Dex and Pinniped should then get external IP addresses provided by the NSX ALB.
$ cat pinniped-supervisor-svc-overlay.yaml #@ load("@ytt:overlay", "overlay") #@overlay/match by=overlay.subset({"kind": "Service", "metadata": {"name": "pinniped-supervisor", "namespace": "pinniped-supervisor"}}) --- #@overlay/replace spec: type: LoadBalancer selector: app: pinniped-supervisor ports: - name: https protocol: TCP port: 443 targetPort: 8443 #@ load("@ytt:overlay", "overlay") #@overlay/match by=overlay.subset({"kind": Service", "metadata": {"name": "dexsvc", "namespace": "tanzu-system-auth"}}), missing_ok=True --- #@overlay/replace spec: type: LoadBalancer selector: app: dex ports: - name: dex protocol: TCP port: 443 targetPort: https $ cat pinniped-supervisor-svc-overlay.yaml | base64 -w 0 I0AgbG9hZCgiQHl0dDpvdmVybG.....wcwo= $ kubectl patch secret ldap-cert-pinniped-addon -n tkg-system -p '{"data": {"overlays.yaml": "I0AgbG9hZCgiQHl0dDpvdmVyb...wcwo="}}' secret/ldap-cert-pinniped-addon patched $ kubectl get all -n pinniped-supervisor NAME READY STATUS RESTARTS AGE pod/pinniped-post-deploy-job-2x7hl 0/1 Completed 0 8m31s pod/pinniped-post-deploy-job-r8hrh 0/1 Error 0 10m pod/pinniped-post-deploy-job-wr7np 0/1 Error 0 9m19s pod/pinniped-supervisor-5dcbd8d56f-g6qkd 1/1 Running 0 8m9s pod/pinniped-supervisor-5dcbd8d56f-zhdfp 1/1 Running 0 8m9s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/pinniped-supervisor LoadBalancer 100.71.206.210 xx.xx.xx.18 443:30044/TCP 10m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/pinniped-supervisor 2/2 2 2 10m NAME DESIRED CURRENT READY AGE replicaset.apps/pinniped-supervisor-5dcbd8d56f 2 2 2 10m NAME COMPLETIONS DURATION AGE job.batch/pinniped-post-deploy-job 1/1 2m34s 10m $ kubectl get all -n tanzu-system-auth NAME READY STATUS RESTARTS AGE pod/dex-688567f8c4-kf5mx 1/1 Running 0 8m29s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/dexsvc LoadBalancer 100.70.240.108 xx.xx.xx.19 443:31265/TCP 11m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/dex 1/1 1 1 11m NAME DESIRED CURRENT READY AGE replicaset.apps/dex-688567f8c4 1 1 1 11m
The services are now of type Load Balancer and have external IP address (which I have intentionally obfuscated). The next step is to delete the pinniped post deploy job, and allow it restart. It will restart automatically. This can take some time (3-4 minutes). I usually run a watch (-w) to keep an eye on it.
$ kubectl get jobs -n pinniped-supervisor NAME COMPLETIONS DURATION AGE pinniped-post-deploy-job 1/1 3m53s 14h $ kubectl delete jobs pinniped-post-deploy-job -n pinniped-supervisor job.batch "pinniped-post-deploy-job" deleted $ kubectl get jobs -n pinniped-supervisor -w NAME COMPLETIONS DURATION AGE pinniped-post-deploy-job 0/1 0s pinniped-post-deploy-job 0/1 0s pinniped-post-deploy-job 0/1 0s 0s pinniped-post-deploy-job 1/1 11s 11s
You can now switch to the non-admin context of the cluster and try to access it. Now, if you are on a desktop and you try to do a query, a browser will be launched and you can provide the LDAP credentials for the user that you wish to authenticate and give TKG cluster access to. If however, like me, you are SSH’ed to a remote host and are running the kubectl get commands remotely in that session, you will see the following.
$ tanzu management-cluster kubeconfig get You can now access the cluster by running 'kubectl config use-context tanzu-cli-ldap-cert@ldap-cert' $ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert Switched to context "tanzu-cli-ldap-cert@ldap-cert". $ kubectl get nodes Error: no DISPLAY environment variable specified ^C
This is working as expected. In the past, you would have required a desktop with a browser to do the authentication. In TKG v1.4, there is a new feature which allows you authenticate LDAP users on environments that do not have browsers (e.g. headless workstations) or when you are SSH’ed to an environment like I am above. This process is covered in the official documentation here. The first step is to set the environment variable TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER. You will then need to remove the earlier non-admin context, and recreate it with the environment variable in place.
$ export TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER=true $ kubectl config delete-context tanzu-cli-ldap-cert@ldap-cert warning: this removed your active context, use "kubectl config use-context" to select a different one deleted context tanzu-cli-ldap-cert@ldap-cert from /home/cormac/.kube/config $ tanzu management-cluster kubeconfig get You can now access the cluster by running 'kubectl config use-context tanzu-cli-ldap-cert@ldap-cert' $ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert Switched to context "tanzu-cli-ldap-cert@ldap-cert". $ kubectl get nodes Please log in: https://xx.xx.xx.18/oauth2/authorize?access_type=offline&client_id=pinniped-cli&\ code_challenge=8VbRBUHKhSK69gP4mq3G1dnd897tu0ShpsDvkuqi1Q0&code_challenge_method=S256&nonce=\ 263df470a232c24a929712565954a2d4&redirect_uri=http%3A%2F%2F127.0.0.1%3A39819%2Fcallback&\ response_type=code&scope=offline_access+openid+pinniped%3Arequest-audience&state=c2c0517fab9affa8741d78d32acdd330
Note the “Please log in” message. You can now copy this URL and paste it on any host that has a browser, but can still reach the Pinniped Supervisor service IP address. Notice that the redirect is a callback to 127.0.0.1 (localhost). This means that there will be a failure expected, as the host with the browser won’t respond to it, but that is ok. When you paste the link into a browser, you should see the following authenticate prompt where you an add the LDAP user which you want to have access to the non-admin context of your cluster:
The browser will redirect to the IP address of the Dex service, but when it tries a callback to localhost, you get this error:
Again, this is expected. The next step is to take the callback URL that was provided to you in the browser (the browser which was unable to do the 127.0.0.1 callback), and run the following command on the host where you ran the initial kubectl get command:
$ curl -L 'http://127.0.0.1:35259/callback?code=4bExu5_NESSuaY7kSCCIGcn8arVwJiVUC19UTZL-\ Jck.61FRn2BeHo7C8fYGbA0aranDsAFT3v0bRTTwq-TizA8&scope=openid+offline_access+pinniped%3A\ request-audience&state=edd3196e00a3eb01403d2d4e2d918fa3' you have been logged in and may now close this tab
Anyway, commands now work as an LDAP user, providing the cluster role binding created as per the official documentation here matches the credentials added for the LDAP user. If the cluster role binding does not exist, you will encounter the following error in trying to query the non-admin cluster as a user who does not have any privileges:
$ kubectl get nodes Error from server (Forbidden): nodes is forbidden: User "chogan@rainpole.com" cannot list resource \ "nodes" in API group "" at the cluster scope
To create it, switch back to the admin context, create the ClusterRoleBinding, switch again to the non-admin context and see if this LDAP user (who you have already authenticated via Dex/Pinniped) can now successfully interact with the cluster.
$ kubectl config use-context ldap-cert-admin@ldap-cert Switched to context "ldap-cert-admin@ldap-cert". $ kubectl config get-contexts CURRENT NAME CLUSTER AUTHINFO NAMESPACE * ldap-cert-admin@ldap-cert ldap-cert ldap-cert-admin tanzu-cli-ldap-cert@ldap-cert ldap-cert tanzu-cli-ldap-cert $ cat chogan-crb.yaml kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: chogan subjects: - kind: User name: chogan@rainpole.com apiGroup: roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io $ kubectl apply -f chogan-crb.yaml clusterrolebinding.rbac.authorization.k8s.io/chogan created $ kubectl get clusterrolebinding chogan NAME ROLE AGE chogan ClusterRole/cluster-admin 6m $ kubectl config use-context tanzu-cli-ldap-cert@ldap-cert Switched to context "tanzu-cli-ldap-cert@ldap-cert". $ kubectl get nodes NAME STATUS ROLES AGE VERSION ldap-cert-control-plane-77g8v Ready control-plane,master 86m v1.21.2+vmware.1 ldap-cert-md-0-b7f799d64-kgcqm Ready <none> 85m v1.21.2+vmware.1
The LDAP user (chogan@rainpole.com) is now able to manage the cluster using the non-admin context of the TKG management cluster.
Useful tips
On one occasion, after deleting the Pinniped post deploy job, I could still not access the cluster. I got the following error:
$ kubectl get pods Error: could not complete Pinniped login: could not perform OIDC discovery for \ "https://xx.xx.xx.16:31234": Get "https://xx.xx.xx.16:31234/.well-known/openid-configuration": \ dial tcp xx.xx.xx.16:31234: connect: connection refused Error: pinniped-auth login failed: exit status 1 Error: exit status 1 Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1
To resolve this issue, I deleted the Pinniped post deploy job for a second time. Once that completed the kubectl command worked as expected. On another occasion, I hit the following issue:
$ kubectl get nodes Error: could not complete Pinniped login: could not perform OIDC discovery for \ "https://xx.xx.xx.18": Get "https://xx.xx.xx.18/.well-known/openid-configuration": \ x509: certificate has expired or is not yet valid: current time 2021-11-18T10:52:35Z is before 2021-11-18T10:54:18Z Error: pinniped-auth login failed: exit status 1 Error: exit status 1 Unable to connect to the server: getting credentials: exec: executable tanzu failed with exit code 1
To resolve this issue, I had to make some changes to the TKG control plane VM’s time sync configuration on vSphere. I did this by editing the settings of the control plane VMs in vSphere, and enabling the option to allow periodic time syncs to the ESXi host (this option is disabled by default). Afterwards, the kubectl command worked as expected. Here is where an admin can modify the Synchronize Time with Host. I simply need to click on the Synchronize time periodically and the issue was fixed.