Configuring Tanzu Kubernetes with a Proxy (Squid)

In this post, I am going to show how I set up my Tanzu Kubernetes Grid management cluster using a proxy configuration. I suspect this may be something many readers might want to try at some point, for various reasons. I will add a caveat to say that I have done the bare minimum to get this configuration to work, so you will probably want to spend far more time than I did on tweaking and tuning the proxy configuration. At the end of the day, the purpose of this exercise is to show how a TKG bootstrap virtual machine (running Ubuntu) can access the internet via the proxy to get OS updates, install docker, pull down docker images, get tanzu plugins and finally build a TKG management cluster. This will involve building two VMs, one to act as the proxy server and the other to act as the bootstrap environment where I can begin to build TKG clusters. Let’s look at the proxy server first.

Step 1. Setup the Proxy Server (Squid)

I created a dual NIC VM, one connection to my internal VLAN and the other with external connectivity. I then installed Ubuntu 20.04 and then followed the steps outlined in the Ubuntu docs for Proxy Servers – Squid. Once the proxy server was running, I wanted to give external access to all IP addresses on my internal VLAN with the 10.35.13.0/24 range. This will be the range where my TKG cluster VMs will be deployed. The following is the /etc/squid/squid.conf file I created. There are a lot of comments in the configuration file, so I used this useful grep command to only display non-commented lines. The 3 main changes are highlighted in blue below.

$ grep -vE '^$|^#' /etc/squid/squid.conf
acl localnet src 0.0.0.1-0.255.255.255  # RFC 1122 "this" network (LAN)
acl localnet src 10.0.0.0/8             # RFC 1918 local private network (LAN)
acl localnet src 100.64.0.0/10          # RFC 6598 shared address space (CGN)
acl localnet src 169.254.0.0/16         # RFC 3927 link-local (directly plugged) machines
acl localnet src 172.16.0.0/12          # RFC 1918 local private network (LAN)
acl localnet src 192.168.0.0/16         # RFC 1918 local private network (LAN)
acl localnet src fc00::/7               # RFC 4193 local private network range
acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines
acl vlan_3513 src 10.35.13.0/24.        # Cormac's internal network
acl SSL_ports port 443
acl Safe_ports port 80          # http
acl Safe_ports port 21          # ftp
acl Safe_ports port 443         # https
acl Safe_ports port 70          # gopher
acl Safe_ports port 210         # wais
acl Safe_ports port 1025-65535  # unregistered ports
acl Safe_ports port 280         # http-mgmt
acl Safe_ports port 488         # gss-http
acl Safe_ports port 591         # filemaker
acl Safe_ports port 777         # multiling http
acl CONNECT method CONNECT
http_access allow vlan_3513
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost manager
include /etc/squid/conf.d/*
http_access allow localnet
http_access allow localhost
http_access deny all
http_port 10.35.13.136:3128
cache_dir ufs /var/spool/squid 100 16 256
coredump_dir /var/spool/squid
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern \/(Packages|Sources)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
refresh_pattern \/Release(|\.gpg)$ 0 0% 0 refresh-ims
refresh_pattern \/InRelease$ 0 0% 0 refresh-ims
refresh_pattern \/(Translation-.*)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
refresh_pattern . 0 20% 4320
via on

Thus, my proxy server is now http://10.35.13.136:3128. Another useful tool is the parsing option available with squid. This will make sure that the entries in the configuration file are correctly formatted.

$ sudo squid -k parse
2021/10/27 15:27:20| Startup: Initializing Authentication Schemes ...
2021/10/27 15:27:20| Startup: Initialized Authentication Scheme 'basic'
2021/10/27 15:27:20| Startup: Initialized Authentication Scheme 'digest'
2021/10/27 15:27:20| Startup: Initialized Authentication Scheme 'negotiate'
2021/10/27 15:27:20| Startup: Initialized Authentication Scheme 'ntlm'
2021/10/27 15:27:20| Startup: Initialized Authentication.
2021/10/27 15:27:20| Processing Configuration File: /etc/squid/squid.conf (depth 0)
2021/10/27 15:27:20| Processing: acl localnet src 0.0.0.1-0.255.255.255 # RFC 1122 "this" network (LAN)
2021/10/27 15:27:20| Processing: acl localnet src 10.0.0.0/8            # RFC 1918 local private network (LAN)
2021/10/27 15:27:20| Processing: acl localnet src 100.64.0.0/10         # RFC 6598 shared address space (CGN)
2021/10/27 15:27:20| Processing: acl localnet src 169.254.0.0/16        # RFC 3927 link-local (directly plugged) machines
2021/10/27 15:27:20| Processing: acl localnet src 172.16.0.0/12         # RFC 1918 local private network (LAN)
2021/10/27 15:27:20| Processing: acl localnet src 192.168.0.0/16        # RFC 1918 local private network (LAN)
2021/10/27 15:27:20| Processing: acl localnet src fc00::/7              # RFC 4193 local private network range
2021/10/27 15:27:20| Processing: acl localnet src fe80::/10             # RFC 4291 link-local (directly plugged) machines
2021/10/27 15:27:20| Processing: acl vlan_3513 src 10.35.13.0/24        # Cormac's internal network
2021/10/27 15:27:20| Processing: acl SSL_ports port 443
2021/10/27 15:27:20| Processing: acl Safe_ports port 80         # http
2021/10/27 15:27:20| Processing: acl Safe_ports port 21         # ftp
2021/10/27 15:27:20| Processing: acl Safe_ports port 443        # https
2021/10/27 15:27:20| Processing: acl Safe_ports port 70         # gopher
2021/10/27 15:27:20| Processing: acl Safe_ports port 210        # wais
2021/10/27 15:27:20| Processing: acl Safe_ports port 1025-65535 # unregistered ports
2021/10/27 15:27:20| Processing: acl Safe_ports port 280        # http-mgmt
2021/10/27 15:27:20| Processing: acl Safe_ports port 488        # gss-http
2021/10/27 15:27:20| Processing: acl Safe_ports port 591        # filemaker
2021/10/27 15:27:20| Processing: acl Safe_ports port 777        # multiling http
2021/10/27 15:27:20| Processing: acl CONNECT method CONNECT
2021/10/27 15:27:20| Processing: http_access allow vlan_3513
2021/10/27 15:27:20| Processing: http_access deny !Safe_ports
2021/10/27 15:27:20| Processing: http_access deny CONNECT !SSL_ports
2021/10/27 15:27:20| Processing: http_access allow localhost manager
2021/10/27 15:27:20| Processing: include /etc/squid/conf.d/*
2021/10/27 15:27:20| Processing Configuration File: /etc/squid/conf.d/debian.conf (depth 1)
2021/10/27 15:27:20| Processing: logfile_rotate 0
2021/10/27 15:27:20| Processing: http_access allow localnet
2021/10/27 15:27:20| Processing: http_access allow localhost
2021/10/27 15:27:20| Processing: http_access deny all
2021/10/27 15:27:20| Processing: http_port 10.35.13.136:3128
2021/10/27 15:27:20| Processing: cache_dir ufs /var/spool/squid 100 16 256
2021/10/27 15:27:20| Processing: coredump_dir /var/spool/squid
2021/10/27 15:27:20| Processing: refresh_pattern ^ftp: 1440 20% 10080
2021/10/27 15:27:20| Processing: refresh_pattern ^gopher: 1440 0% 1440
2021/10/27 15:27:20| Processing: refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
2021/10/27 15:27:20| Processing: refresh_pattern \/(Packages|Sources)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
2021/10/27 15:27:20| Processing: refresh_pattern \/Release(|\.gpg)$ 0 0% 0 refresh-ims
2021/10/27 15:27:20| Processing: refresh_pattern \/InRelease$ 0 0% 0 refresh-ims
2021/10/27 15:27:20| Processing: refresh_pattern \/(Translation-.*)(|\.bz2|\.gz|\.xz)$ 0 0% 0 refresh-ims
2021/10/27 15:27:20| Processing: refresh_pattern . 0 20% 4320
2021/10/27 15:27:20| Processing: via on
2021/10/27 15:27:20| Initializing https:// proxy context

Once it has successfully processed without errors, we can proceed with building our second VM which will access the internet via the proxy server. Like I said in the introduction, I have done the bare minimum proxy configuration here to get this working, so you may want to spend some more time and research on additional security steps.

Step 2. Setup the proxy client VM / tanzu bootstrap node

I have broken this step up into a a number of parts as there are a number of items to consider.

2.1 Ubuntu and Docker

Before we do anything with TKG and tanzu, we must first setup this virtual machine / guest OS to function via the proxy. Again, I have installed Ubuntu 20.04. The next step is to install docker. I used the official docker guide to installing docker on Ubuntu using the repository. Note however that this makes extensive use of apt calls, as well as curl. Both of these need to be told how to use the proxy. To enable apt to access the internet via a proxy, simply create the file /etc/apt/apt.conf.d/proxy.conf and add the following lines for both http and https (changing the settings to your proxy server and port of course):

Acquire::http::Proxy "http://10.35.13.136:3128";
Acquire::https::Proxy "http://10.35.13.136:3128";

The next time you run an apt command, it should use this proxy configuration. For curl, you simply need to ensure that you include a -x or –proxy with the curl command and also provide the [protocol]://[proxy-server]:[proxy port], e.g. http://10.35.13.136:3128 parameter. This will now allow you to install docker on the VM.

The final step to make sure docker is functioning is to tell the docker daemon about the proxy server so that it directs it’s registry pull requests via the proxy. Again, the official docker documentation shows how to create a proxy configuration. In a nutshell, you must create the file  /etc/systemd/system/docker.service.d/http-proxy.conf and add the following entries (again, changing the settings to your proxy server and port of course).

[Service]
Environment="HTTP_PROXY=http://10.35.13.136:3128"
Environment="HTTPS_PROXY=http://10.35.13.136:3128"

Reload and restart docker, then try a simple docker test, such as docker run hello-world. If the image is successfully fetched from the docker registry, you should be good to proceed to the next step.

2.2 tanzu CLI setup

You should be able to download and install the tanzu CLI as per the official tanzu documentation. However, there is a caveat around the tanzu plugin list command. This command attempts to pull a manifest from an external repository as per the following failure:

$ tanzu plugin list
Error: could not fetch manifest from repository "core": Get "https://storage.googleapis.com/tanzu-cli/artifacts/manifest.yaml": dial tcp 142.250.189.176:443: i/o timeout
✖ could not fetch manifest from repository "core": Get "https://storage.googleapis.com/tanzu-cli/artifacts/manifest.yaml": dial tcp 142.250.189.176:443: i/o timeout

This is failing as it is not sending the request through the proxy server, but instead it is trying to reach it directly from the internal network. To my knowledge, there is no way to specify a proxy on the tanzu command line (I may be mistaken here, but I was unable to find a way), so in order to address this, you need to set some proxy environment variables in your shell. This can be done a number of ways. You could add the proxies to the global network configuration of the OS which will then automatically add the environment variables to your shell. Or alternatively, set them in your profile. I added them to my ~/.bash_profile as follows:

export HTTP_PROXY=http://10.35.13.136:3128/
export HTTPS_PROXY=http://10.35.13.136:3128/

In either case, they appear as environment variables in your shell, and I also believe they work with both upper-case and lower-case names. Now the tanzu plugin list command works as expected:

$ tanzu plugin list
  NAME                LATEST VERSION  DESCRIPTION                                                        REPOSITORY  VERSION  STATUS
  alpha               v1.3.1          Alpha CLI commands                                                 core                not installed
  cluster             v1.3.1          Kubernetes cluster operations                                      core        v1.3.1  installed
  kubernetes-release  v1.3.1          Kubernetes release operations                                      core        v1.3.1  installed
  login               v1.3.1          Login to the platform                                              core        v1.3.1  installed
  management-cluster  v1.3.1          Kubernetes management cluster operations                           core        v1.3.1  installed
  pinniped-auth       v1.3.1          Pinniped authentication operations (usually not directly invoked)  core        v1.3.1  installed

We can now proceed with the creation of the TKG management cluster.

2.3 TKG management cluster deployment – docker requirements

Before creating a management cluster, add your user to the docker group, or else the tanzu management-cluster create command will complain that the docker daemon is not running. A sudo will not even help. This is the error reported.

$ tanzu management-cluster create -u

Validating the pre-requisites...
Error: docker prerequisites validation failed: Docker daemon is not running, Please make sure Docker daemon is up and running

You can use the usermod command to add your user (in this case cormac) to the docker group.

$ sudo usermod -aG docker cormac
[sudo] password for cormac:******

Now the pre-requisites will pass:

$ tanzu management-cluster create -u

Validating the pre-requisites...
Serving kickstart UI at http://127.0.0.1:8080

2.4 TKG Management Cluster – vCenter server considerations

Now we get to the part where I spun my wheels the most. When I connect to my vCenter server from a browser on my bootstrap VM through the proxy, I get the following pop-up:

And this is fine since it is my lab. I haven’t signed my vCenter certificates. I can just go ahead and accept the risk. However, this raises another issue for the TKG UI. It also reports that it has found a certificate signed by an unknown authority. There is, however, no way to tell TKG to accept the risk and continue.

Now, there may be some ways of allowing this to work via the proxy configuration. I thought the ssl-bump feature in Squid might allow this to work, but there seems to be some issues with using this feature on Ubuntu. In the end, I decide that the easiest thing to do would be to create yet another environment variable, NO_PROXY, and add the vCenter server domain to it. NO_PROXY is essentially a white list of connections that do not go through the proxy. After adding eng.vmware.com to the NO_PROXY settings, the TKG UI was able to proceed with the connection to vCenter.

2.5 TKG Management Cluster – Proxy Settings

Later on in the UI, you are prompted for proxy details for the TKG management cluster itself. You can add the vCenter domain here. Do not place any wildcards in the NO_PROXY field, such as an “*”. Even though the TKG UI will accept *.eng.vmware.com, it will fail later on parsing it as it expects alphabetic or numeric characters only. Here is an example of the Proxy Settings from my deployment:

Note that the NO_PROXY field prompts you to add additional entries to the NO_PROXY list, such as the Pod CIDR, Service CIDR and others. This is so that internal TKG / Kubernetes cluster communication does not use the proxy, e.g. for logging. Entries similar to the following should now appear in the TKG management cluster configuration file:

TKG_HTTP_PROXY: http://10.35.13.136:3128
TKG_HTTP_PROXY_ENABLED: "true"
TKG_HTTPS_PROXY: http://10.35.13.136:3128
TKG_NO_PROXY: eng.vmware.com,127.0.0.0/8,::1,svc,svc.cluster.local,100.64.0.0/16,100.96.0.0/16

You should now have everything in place to successfully deploy a TKG management cluster via a proxy.

Step 3. Deploy TKG Management Cluster via Proxy

The TKG management cluster deployment via a proxy now appears much the same as a standard deployment, except that the container images are pulled via the proxy rather than directly from the internet.

$ tanzu management-cluster get
  NAME        NAMESPACE   STATUS    CONTROLPLANE  WORKERS  KUBERNETES        ROLES
  mgmt-proxy  tkg-system  creating  0/1           1/1      v1.20.5+vmware.1  management


Details:

NAME                                                           READY  SEVERITY  REASON                  SINCE  MESSAGE
/mgmt-proxy                                                    False  Info      WaitingForControlPlane  18s
├─ClusterInfrastructure - VSphereCluster/mgmt-proxy            True                                     17s
├─ControlPlane - KubeadmControlPlane/mgmt-proxy-control-plane
│ └─Machine/mgmt-proxy-control-plane-pvnzg                     True                                     12s
└─Workers
  └─MachineDeployment/mgmt-proxy-md-0
    └─Machine/mgmt-proxy-md-0-df8c9b68-b8dfb                   True                                     12s

Providers:

  NAMESPACE                          NAME                    TYPE                    PROVIDERNAME  VERSION  WATCHNAMESPACE
  capi-kubeadm-bootstrap-system      bootstrap-kubeadm       BootstrapProvider       kubeadm       v0.3.14
  capi-kubeadm-control-plane-system  control-plane-kubeadm   ControlPlaneProvider    kubeadm       v0.3.14
  capi-system                        cluster-api             CoreProvider            cluster-api   v0.3.14
  capv-system                        infrastructure-vsphere  InfrastructureProvider  vsphere       v0.7.7


$ tanzu management-cluster get
  NAME        NAMESPACE   STATUS   CONTROLPLANE  WORKERS  KUBERNETES        ROLES
  mgmt-proxy  tkg-system  running  1/1           1/1      v1.20.5+vmware.1  management


Details:

NAME                                                           READY  SEVERITY  REASON  SINCE  MESSAGE
/mgmt-proxy                                                    True                     61s
├─ClusterInfrastructure - VSphereCluster/mgmt-proxy            True                     80s
├─ControlPlane - KubeadmControlPlane/mgmt-proxy-control-plane  True                     62s
│ └─Machine/mgmt-proxy-control-plane-pvnzg                     True                     75s
└─Workers
  └─MachineDeployment/mgmt-proxy-md-0
    └─Machine/mgmt-proxy-md-0-df8c9b68-b8dfb                   True                     75s

Providers:

  NAMESPACE                          NAME                    TYPE                    PROVIDERNAME  VERSION  WATCHNAMESPACE
  capi-kubeadm-bootstrap-system      bootstrap-kubeadm       BootstrapProvider       kubeadm       v0.3.14
  capi-kubeadm-control-plane-system  control-plane-kubeadm   ControlPlaneProvider    kubeadm       v0.3.14
  capi-system                        cluster-api             CoreProvider            cluster-api   v0.3.14
  capv-system                        infrastructure-vsphere  InfrastructureProvider  vsphere       v0.7.7


$ tanzu login
? Select a server mgmt-proxy          ()
✔  successfully logged in to management cluster using the kubeconfig mgmt-proxy


$ kubectl config get-contexts
CURRENT   NAME                          CLUSTER      AUTHINFO           NAMESPACE
*         mgmt-proxy-admin@mgmt-proxy   mgmt-proxy   mgmt-proxy-admin


$ kubectl get nodes
NAME                             STATUS   ROLES                  AGE     VERSION
mgmt-proxy-control-plane-pvnzg   Ready    control-plane,master   5m18s   v1.20.5+vmware.1
mgmt-proxy-md-0-df8c9b68-b8dfb   Ready    <none>                 4m25s   v1.20.5+vmware.1


$ kubectl get apps -A
NAMESPACE    NAME                   DESCRIPTION           SINCE-DEPLOY   AGE
tkg-system   antrea                 Reconcile succeeded   2m37s          2m38s
tkg-system   metrics-server         Reconcile succeeded   2m37s          2m38s
tkg-system   tanzu-addons-manager   Reconcile succeeded   2m26s          5m32s
tkg-system   vsphere-cpi            Reconcile succeeded   2m24s          2m38s
tkg-system   vsphere-csi            Reconcile succeeded   2m20s          2m38s
For troubleshooting purposes, you can check that the proxy is working by monitoring the /var/log/squid/access.log on the squid server. You should see TCP_TUNNEL entries when there are requests from clients to the proxy server. In the snippet shown below, IP addresses ending in .140 is the control plane node on the management cluster. Here it is making requests to pull images from the VMware registry via the proxy server.
1635334052.674    897 10.35.13.140 TCP_TUNNEL/200 17034 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -
1635334070.195    883 10.35.13.140 TCP_TUNNEL/200 16013 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -
1635334073.462    860 10.35.13.140 TCP_TUNNEL/200 15563 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -
1635334075.522   1045 10.35.13.140 TCP_TUNNEL/200 14602 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -
1635334090.337  20086 10.35.13.140 TCP_TUNNEL/200 1391477 CONNECT 10.35.13.157:6443 - HIER_DIRECT/10.35.13.157 -
1635334090.452     84 10.35.13.140 TCP_TUNNEL/200 98468 CONNECT 10.35.13.157:6443 - HIER_DIRECT/10.35.13.157 -
1635334109.229   1014 10.35.13.140 TCP_TUNNEL/200 22500 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -
1635334111.024   1066 10.35.13.140 TCP_TUNNEL/200 17419 CONNECT projects.registry.vmware.com:443 - HIER_DIRECT/10.188.25.227 -