Monitoring Kubernetes with Wavefront via Proxy Chaining

Regular readers will be aware that I have been looking at various tools that will allow for the management and monitoring of Kubernetes running on vSphere. In the past, we’ve looked at the vRealize Operations Management Pack for Container Monitoring and vRealize Network Insight for Kubernetes. One of the other VMware products that I really wanted to try out is Wavefront. Wavefront is pretty neat as it has around 200+ pre-built integrations and dashboards. This makes it extremely easy to ingest and visualize performance data. My main issue with getting this up and running is that my Kubernetes cluster (running on PKS, Pivotal Container Service) was isolated from the internet (which I suspect is the same for many customers). Whilst Wavefront provide a Proxy Pod that can get the various K8s metrics out of your K8s cluster, I have another level of indirection where these metrics need to be forwarded to another Proxy (this time in a virtual machine). This VM has access to the internet, and thus can reach my Wavefront cluster URL at https://vmware.wavefront.com. It might be easier to visualize, so I put this diagram together to show the constituent parts.

The following are the steps that I followed in order to get metrics sent from my K8s cluster to my Wavefront cluster.

1. Setup VM (Linux) as a Proxy

Wavefront already provide the steps to do the various integrations necessary to setup a Linux VM as a proxy, as well as for the Linux VM to send metrics (via a Telegraf agent) to the Proxy, and thus back to your Wavefront cluster. If you do not have an existing Wavefront cluster, you can create one with a 30 day free trial. When you login to your Wavefront cluster portal, navigate to the integrations and select the Linux Host integration, then the Setup steps. My Linux VM is running Ubuntu 17.10 (Artful Aardvark). Thus, the Wavefront integration steps provided me  exactly what I need to do to install the Proxy. This involves pulling down the appropriate wavefront proxy package for my distro, and the steps to install the proxy. Finally I was given the command to install the Telegraf agent. Once completed, the configuration file for the Wavefront proxy is located in /etc/wavefront/wavefront-proxy/wavefront.conf and the configuration file for the Telegraf agent is located in /etc/telegraf/telegraf.conf. The next step is to edit the wavefront configuration file, and set 3 parameters, namely server, hostname and token. The server is your Wavefront cluster, the hostname is used to identify stats from this proxy, and the token is an API token for your account. Details on how to retrieve the token are shown in the comments below, as well on the Linux Host integration setup steps.

##############################################################################
# Wavefront proxy configuration file
#
#   Typically in /etc/wavefront/wavefront-proxy/wavefront.conf
#
##############################################################################
# The server should be either the primary Wavefront cloud server, or your custom VPC address.
#   This will be provided to you by Wavefront.
#
server=https://vmware.wavefront.com/api

# The hostname will be used to identify the internal proxy statistics around point rates, JVM info, etc.
#  We strongly recommend setting this to a name that is unique among your entire infrastructure,
#   possibly including the datacenter information, etc. This hostname does not need to correspond to
#   any actual hostname or DNS entry; it's merely a string that we pass with the internal stats.
#
hostname=pks-cli.rainpole.com

# The Token is any valid API Token for your account, which can be generated from the gear icon
#   at the top right of the Wavefront site, under 'Settings'. Paste that hexadecimal token
#   after the '=' below, and the proxy will automatically generate a machine-specific UUID and
#   self-register.
#
token=xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx

# Comma separated list of ports to listen on for Wavefront formatted data
pushListenerPorts=2878

#
##############################################################################

While there are a lot of other parameters, these are the only ones you need to concentrate on for the moment. Restart the Wavefront proxy (sudo /etc/init.d/wavefront-proxy restart). That now completes the Proxy VM setup, and at this point the proxy should be visible in the list of active proxies on your Wavefront cluster.

At this point, the Proxy VM has been setup and is sending metrics to the Wavefront cluster. you can look at some of the Linux VM dashboards to verify. The next step is to deploy some components on the K8s cluster, and have it send relevant K8s metrics to this Proxy VM from a Proxy Pod. Then we should have these metrics displayed in Wavefront. Let’s do that next.

2. Deploy and Configure Proxy POD

To get the steps to deploy a wavefront proxy for Kubernetes, select the Kubernetes integration in the Wavefront UI, and once again select the setup steps. Let’s focus on step 1, which is the deployment of the Proxy. In the manifest/YAML file for the Wavefront Proxy Pod, there are a number of environment variables, such as WAVEFRONT_URL, WAVEFRONT_TOKEN and WAVEFRONT_PROXY_ARGS. I had a hard time initially when trying to configure these, especially the WAVEFRONT_PROXY_ARGS where I had read that the value of this environment variable should contain the –proxyHost which pointed back to my Proxy VM (Linux VM). However, this resulted in a lot of error spew in the logs, notably around “[agent:checkin] configuration file read from server is invalid“.  It was hard to tell if this was working or not, so I reached out to Vasily Vorontsov, one of our WaveFront engineers, who recommended that I go with a different approach. He recommended that in the Proxy Pod YAML, I should set the WAVEFRONT_URL to my proxy VM, using port 2879. I could leave the token as is, and that there was no need to set anything in the WAVEFRONT_PROXY_ARGS. This is a snippet of what the environment variable section of my YAML file looked like:

env:
- name: WAVEFRONT_URL
value: http://proxy-vm:2879/api
- name: WAVEFRONT_TOKEN
value: xxxxxxxx-xxxx-xxxx-xxxxxxxxxx

 

Now, for this to work, I had to make another change to the Proxy VM configuration file. This change was to ad a new entry call pushRelayListenerPort as follows:

# Comma separated list of (proxy) ports to listen on for Wavefront formatted data
pushRelayListenerPorts=2879

 

And then restart the Proxy service once more. Here are some log snippets from the startup of the Proxy VM and Proxy Pod. Everything seemed much cleaner now:

 

From Proxy VM logs:

$ tail -f /var/log/wavefront/wavefront.log
2019-08-06 15:27:19,151 INFO  [agent:setupCheckins] scheduling regular check-ins
2019-08-06 15:27:19,152 INFO  [agent:setupCheckins] initial configuration is available, setting up proxy
2019-08-06 15:27:19,392 INFO  [agent:startListeners] listening on port: 2878 for Wavefront metrics
2019-08-06 15:27:19,411 INFO  [agent:startListeners] listening on port: 2003 for graphite metrics
2019-08-06 15:27:19,420 INFO  [agent:startListeners] Not loading logs ingestion -- no config specified.
2019-08-06 15:27:24,424 INFO  [agent:run] setup complete
2019-08-06 15:27:29,163 INFO  [agent:checkin] Checking in: https://vmware.wavefront.com/api
2019-08-06 15:27:29,392 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:29,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:29,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:39,389 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:39,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:39,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:49,389 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:49,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:49,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:59,388 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:59,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:27:59,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:09,388 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:09,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:09,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:19,389 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 3 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:19,389 INFO  [AbstractReportableEntityHandler:printTotal] [2878] Total points processed since start: 232; blocked: 2
2019-08-06 15:28:19,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:19,403 INFO  [AbstractReportableEntityHandler:printTotal] [2003] Total points processed since start: 0; blocked: 0
2019-08-06 15:28:19,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:19,419 INFO  [AbstractReportableEntityHandler:printTotal] [2879] Total points processed since start: 0; blocked: 0
2019-08-06 15:28:29,388 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 4 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:29,403 INFO  [AbstractReportableEntityHandler:printStats] [2003] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 15:28:29,419 INFO  [AbstractReportableEntityHandler:printStats] [2879] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).

From K8s Pod proxy logs:

$ kubectl logs wavefront-proxy-79568456c6-z82rh -f

2019-08-06 14:28:56,168 INFO  [agent:start] Starting proxy version 4.38
2019-08-06 14:28:56,182 INFO  [agent:parseArguments] Arguments: -h, http://192.50.0.6:2879/api, -t, <HIDDEN>, --hostname, wavefront-proxy-79568456c6-z82rh,\
 --ephemespool/wavefront-proxy/buffer, --flushThreads, 6, --retryThreads, 6
2019-08-06 14:28:56,251 INFO  [agent:parseArguments] Unparsed arguments: true
2019-08-06 14:28:56,300 WARN  [agent:loadListenerConfigurationFile] Loaded configuration file null
2019-08-06 14:28:56,623 INFO  [agent:start] Ephemeral proxy id created: 55a0682a-e23b-4316-b0f7-aa602dfc331e
2019-08-06 14:28:56,992 INFO  [QueuedAgentService:<init>] No rate limit configured.
2019-08-06 14:28:57,202 INFO  [QueuedAgentService:lambda$new$5] retry queue has been cleared
2019-08-06 14:28:57,205 WARN  [QueuedAgentService:lambda$new$5] source tag retry queue has been cleared
2019-08-06 14:28:57,225 INFO  [agent:checkin] Checking in: http://192.50.0.6:2879/api

2019-08-06 14:28:57,387 INFO  [agent:setupCheckins] scheduling regular check-ins
2019-08-06 14:28:57,389 INFO  [agent:setupCheckins] initial configuration is available, setting up proxy
2019-08-06 14:28:57,474 INFO  [agent:startListeners] listening on port: 2878 for Wavefront metrics
2019-08-06 14:28:57,481 INFO  [agent:startListeners] Not loading logs ingestion -- no config specified.
2019-08-06 14:29:02,491 INFO  [agent:run] setup complete
2019-08-06 14:29:07,398 INFO  [agent:checkin] Checking in: http://192.50.0.6:2879/api
2019-08-06 14:29:07,473 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:17,470 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:27,470 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:37,470 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:47,470 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:57,470 INFO  [AbstractReportableEntityHandler:printStats] [2878] Points received rate: 0 pps (1 min), 0 pps (5 min), 0 pps (current).
2019-08-06 14:29:57,470 INFO  [AbstractReportableEntityHandler:printTotal] [2878] Total points processed since start: 0; blocked: 0

This looks like proxy chaining is working correctly. To verify, check the Proxy Pod logs – if the “Total points processed” number is not increasing, then that proxy is not receiving any metrics. If it is increasing then it is working as expected.

It should be noted that if you are running PKS and your Kubernetes nodes had access to the internet, it is quite straight-forward to deploy the Kubernetes Pod proxy, since it would be able to reach the Wavefront cluster directly and there would be no need to chain proxies. This can be done with a simple modification of the PKS tile to add the Wavefront cluster URL and Token. However, there is no way to tell PKS about a chained proxy when the nodes cannot reach the internet, so this is how you have to do it.

You can now complete the Pod Proxy deployment by deploying the kube-state-metrics service and the Wavefront Kubernetes Collector. The kube-state-metrics service listens to the Kubernetes API server and generates metrics about the state of Kubernetes objects. The full set of steps are once again include in the Kubernetes integration steps. The only change is that collector needs to be modified to include the name of your Kubernetes cluster in the collector container deployment:

- --sink=wavefront:?proxyAddress=wavefront-proxy.default.svc.cluster.local:2878\
  &clusterName=cork8s-cluster-01&includeLabels=true

This takes us to our last step, which is to verify that we are actually receiving metrics back on our Wavefront cluster.

3. Verify metrics are being received

Back on the Kubernetes integration, this time we select dashboards. The two dashboards which display information about my Kubernetes cluster are Kube-state Metrics and Kubernetes Collector Metrics. When you select the dashboard, you will be given a list of K8s clusters to choose from. You simply choose your one from the list (assuming metrics are being received). In my case, my cluster is cork8s-cluster-01, and when I select this from the list, here is what I see in Kube-state metrics.

I can drill down further and get even further information about my K8s cluster.

The thing to note is that this is a predefined dashboard from Wavefront. It does have a lot of very useful, low-level information about the behaviour of the cluster, but the nice thing is that you can clone this dashboard, and then modify it so it displays only the metrics that are meaningful to you. And the powerful thing about Wavefront is that it can ingest millions of metric data points per second. This means that you are able to monitor over 100,000 containers in real-time. This is why Wavefront is used by the likes of box, reddit and workday.

That concludes the post. If you do have a Kubernetes cluster that you wish to monitor with Wavefront, but it is not connected to the internet, chaining proxies like I’ve showed you to do in this post should help you work around that situation. Kudos once again to Vasily for his help on this solution.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.