WaveFront Collector Issues: Error in scraping containers
I was very pleased last week, as I managed to get a bunch of metrics sent from my Kubernetes cluster into Wavefront by chaining proxies together. I was successfully able to see my cluster’s Kube-state Metrics and Kubernetes Collector Metrics in Wavefront. However, on closer inspection, I noticed that a number of the built-in Wavefront Kubernetes dashboards were not being populated (Kubernetes Metrics and Kubernetes Metrics by Namespace), and then I found a number of errors in the Wavefront collector logs in my deployment. This post will describe what these errors were, and how I rectified them.
There were 2 distinct errors related to scraping containers (i.e. gathering logs from containers). First, there were the ones related to the kubelet (the part of Kubernetes that runs on the nodes). I had one of these errors for each of the nodes in the Kubernetes cluster, in my case 3. I was able to view these errors by displaying the logs on the Wavefront Collector Pod via kubectl log :
E0811 09:01:05.002411 1 manager.go:124] Error in scraping containers from \ kubelet_summary:192.168.192.5:10255: Get http://192.168.192.5:10255/stats/summary/: \ dial tcp 192.168.192.5:10255: connect: connection refused E0811 09:01:05.002573 1 manager.go:124] Error in scraping containers from \ kubelet_summary:192.168.192.3:10255: Get http://192.168.192.3:10255/stats/summary/: \ dial tcp 192.168.192.3:10255: connect: connection refused E0811 09:01:05.032201 1 manager.go:124] Error in scraping containers from \ kubelet_summary:192.168.192.4:10255: Get http://192.168.192.4:10255/stats/summary/: \ dial tcp 192.168.192.4:10255: connect: connection refused
There was a second error observed as well. This one was against the kube-dns service (which corresponds to the 10.100.200.2 IP address:
E0811 09:01:05.008521 1 manager.go:124] Error in scraping containers from \
prometheus_source: http://10.100.200.2:9153/metrics: Get http://10.100.200.2:9153/metrics: \
dial tcp 10.100.200.2:9153: connect: network is unreachable
Thus, two distinct problems. My Kubernetes nodes were not allowing the Wavefront collector to connect with a connection refused on port 10255 on the nodes, and when it tried to connect to the kube-dns metrics port 9153, it simply not reachable.
Let’s concentrate on the kubelet issue first. This appears to be a common enough issue where kubelets do not allow metrics to be retrieved on port 10255. I found a discussion online which suggested that the kubelets need to be started with –readonly-port=10255 on the nodes. A workaround was to use https kubelet port 10250 instead of the read-only http port 10255. To do that, the following change was made to the Wavefront collector YAML file:
from: - --source=kubernetes.summary_api:'' to: - --source=kubernetes.summary_api:''?kubeletHttps=true&kubeletPort=10250&insecure=true
This now allows metrics to be retrieved from the nodes. Let’s now look at the kube-dns issue. I found the solution to that issue in this online discussion. It seems that the Wavefront collector is configured to scrape a port 9153 named metrics from CoreDNS but the kube-dns service does NOT have this port configured. By editing the kube-dns service and adding the port, the issue was addressed. I’m not sure if this configuration whereby the port is configured on the Pod, but not on the Service, is a nuance of PKS, since i am using PKS to deploy my K8s clusters. On editing the service, simply add the new port to the port section of the manifest.
$ kubectl get svc -n kube-system kube-dns -o json | jq .spec.ports [ { "name": "dns", "port": 53, "protocol": "UDP", "targetPort": 53 }, { "name": "dns-tcp", "port": 53, "protocol": "TCP", "targetPort": 53 } ] $ kubectl get pods -n kube-system --selector k8s-app=kube-dns -o json | jq \ .items[0].spec.containers[].ports [ { "containerPort": 53, "name": "dns", "protocol": "UDP" }, { "containerPort": 53, "name": "dns-tcp", "protocol": "TCP" }, { "containerPort": 9153, "name": "metrics", "protocol": "TCP" } ] $ kubectl edit svc -n kube-system kube-dns service/kube-dns edited $ kubectl get svc -n kube-system kube-dns -o json | jq .spec.ports [ { "name": "dns", "port": 53, "protocol": "UDP", "targetPort": 53 }, { "name": "dns-tcp", "port": 53, "protocol": "TCP", "targetPort": 53 }, { "name": "metrics", "port": 9153, "protocol": "TCP", "targetPort": 9153 } ]
Now all scrapes are working, according to the logs:
I0812 09:30:05.000218 1 manager.go:91] Scraping metrics start: 2019-08-12 09:29:00 +0000 UTC, end: 2019-08-12 09:30:00 +0000 UTC I0812 09:30:05.000282 1 manager.go:96] Scraping sources from provider: internal_stats_provider I0812 09:30:05.000289 1 manager.go:96] Scraping sources from provider: prometheus_metrics_provider: kube-system-service-kube-dns I0812 09:30:05.000317 1 manager.go:96] Scraping sources from provider: prometheus_metrics_provider: kube-system-service-kube-state-metrics I0812 09:30:05.000324 1 manager.go:96] Scraping sources from provider: prometheus_metrics_provider: pod-velero-7d97d7ff65-drl5c I0812 09:30:05.000329 1 manager.go:96] Scraping sources from provider: kubernetes_summary_provider I0812 09:30:05.000364 1 summary.go:452] nodeInfo: [nodeName:fd8f9036-189f-447c-bbac-71a9fea519c0 hostname:192.168.192.3 hostID: ip:192.168.192.3] I0812 09:30:05.000374 1 summary.go:452] nodeInfo: [nodeName:ebbb4c31-375b-4b17-840d-db0586dd948b hostname:192.168.192.4 hostID: ip:192.168.192.4] I0812 09:30:05.000409 1 summary.go:452] nodeInfo: [nodeName:140ab5aa-0159-4612-b68c-df39dbea2245 hostname:192.168.192.5 hostID: ip:192.168.192.5] I0812 09:30:05.006776 1 manager.go:120] Querying source: internal_stats_source I0812 09:30:05.007593 1 manager.go:120] Querying source: prometheus_source: http://172.16.6.2:8085/metrics I0812 09:30:05.010829 1 manager.go:120] Querying source: kubelet_summary:192.168.192.3:10250 I0812 09:30:05.011518 1 manager.go:120] Querying source: prometheus_source: http://10.100.200.187:8080/metrics I0812 09:30:05.034885 1 manager.go:120] Querying source: kubelet_summary:192.168.192.4:10250 I0812 09:30:05.037789 1 manager.go:120] Querying source: prometheus_source: http://10.100.200.2:9153/metrics I0812 09:30:05.053807 1 manager.go:120] Querying source: kubelet_summary:192.168.192.5:10250 I0812 09:30:05.308996 1 manager.go:179] ScrapeMetrics: time: 308.554053ms size: 83 I0812 09:30:05.311586 1 manager.go:92] Pushing data to: Wavefront Sink I0812 09:30:05.311602 1 manager.go:95] Data push completed: Wavefront Sink I0812 09:30:05.311611 1 wavefront.go:122] received metric points: 28894 I0812 09:30:05.947893 1 wavefront.go:133] received metric sets: 91
Now when I log on to Wavefront, I am able to see my Kubernetes cluster in some of the additional dashboards (the ones where my cluster did not previously appear). Select Integrations > Kubernetes > Dashboards. Select Kubernetes Metrics as the dashboard to view. From the drop down, select the name of the cluster. My cluster (cork8s-cluster-01) is now available. Previously it was not in the list of clusters.
The other dashboard where my cluster was not visible was the Kubernetes Metric by Namespace dashboard. Now I can see my cluster here as well. In this dashboard, select the namespace that you are interested in monitoring.
And that completes the post. I now have my K8s cluster sending all metrics back to Wavefront for monitoring. I do want to add that this was how I got these issues resolved in my own lab. For production related issues, I would certainly speak to the Wavefront team and verify that there are no gotchas before implementing these workarounds.