Announcing vSphere CSI driver v2.5 metrics for Prometheus monitoring

This post will look at another new feature that has been added to the vSphere CSI driver v2.5. This feature enables the exposing of CSI metrics so that they can be collected by Prometheus and stored as time series data. Using the information captured in Prometheus, we can build Grafana dashboards which makes is easy to monitor the health and stability of the CSI driver. Kudos to one of our vSphere CSI driver engineers, Liping Xue, who did a great write-up on how to test this feature, and who’s content I relied on heavily to create this post. In the…

Prometheus & Grafana Monitoring Stack on TKGS workload cluster in vSphere with Tanzu

In this post, we are going to build on the work already done when we deployed Carvel packages on a Tanzu Kubernetes workload cluster created by the TKG Service in vSphere with Tanzu. We saw in that post what the requirements are, how to use the tanzu command line to set context to a workload cluster, add the TKG v1.4 package repository. We also saw how to use the tanzu CLI to deploy our first package, which was cert manager. We will now continue with the deployment of a number of other packages, such as Contour (for Ingress), External-DNS (to…

Deploying a monitoring stack (Prometheus and Grafana) on TKG v1.4 with External-DNS

Many customers who have deployed Tanzu Kubernetes would like to monitor activity on the cluster. In TKG v1.4, VMware provides all of the packages one would required to setup a full monitoring stack using Prometheus and Grafana. Prometheus records real-time metrics and Grafana provides charts, graphs, and alerts when connected to a supported data source, such as Prometheus. Prometheus has a dependency on an Ingress, which we will provide through the Contour controller package (which includes an Envoy Ingress). In fact, Prometheus leverages a special kind of Ingress called a HTTPProxy which is provided with Contour. We are also going…

TKG v1.4 Prometheus + Grafana Package Deployment: package reconciliation failed

I was recently running through the exercise of deploying Cert Manager, Contour (+ Envoy Ingress), Prometheus and Grafana packages available with TKG v1.4, just to see what steps were involved in setting up a full monitoring stack for my TKG cluster. This was a TKG deployment to vSphere, using the NSX Advanced Load Balancer for Load Balancer functionality. You can read about the new enhancements around the NSX ALB and TKG v1.4 here.  Honestly, it is pretty straight-forward, with some detailed documentation on the topic available here. Everything was plain sailing until I tried to deploy the Grafana package with,…