Regular readers will know that I have been spending quite a bit of time recently looking at Kubernetes running on vSphere. I’ve written a number of posts on the storage side of things, which you can read here as part of my 101 series. I also posted about how you can setup vRealize Operations Manager 7.5 and add the Management Pack for Container Monitoring. This provide some really good dashboards for examining the state of your K8s clusters, as well as detailed breakdowns into K8s node VM health and performance (CPU, Memory, DIsk IO). Not only that, but you can then look at your container environment, which container is in which Pod, and which Node that Pods is scheduled on. You can then drill into the individual Pods to get key metrics such as CPU and Memory usage, as well as number of containers in the Pod. Very useful information for sure.
The next thing I wanted to monitor was my networking. Since my infrastructure includes NSX-T, I wanted a way to get visibility into what was happening at the network level. NSX-T provides me with Node addresses, Pod addresses, as well as Load Balancing IP for my services. I already added the NSX-T adapter to vRealize Operations Manager, and this was once again giving me some good insights into the environment. However, I wanted to delve deeper into the Kubernetes networking, figure out which components were talking to each other, and so on. I then stumbled across this really interesting post from Matt Just where he shows how to use vRealize Network Insight to get Kubernetes Networking details. This sounded like it was exactly what I needed, so I decided to deploy it. This post includes some details about how to get started with vRNI 4.1.1, and integrate it with your physical switch, NSX-T, PKS (Pivotal Container Services), and Kubernetes.
vRNI comprises of (at least) 2 distinct components – the platform itself, and the collector/proxy component. You will need to have 2 static IP addresses set aside on your network to deploy it. Both components come as OVAs, so start by deploying the platform component, then the proxy (you need info from the platform to deploy the proxy). The deployment is almost identical for both appliances – you do need to login to the console of each to populate required details. This includes details like password information and network configuration. Here is a snippet taken from the console of my deployment:
Once the platform appliance has been deployed and configured, you are prompted to point a browser to it to complete the setup (adding and activating your license key). At this point, you have the ability to generate a secret key for your collector/proxy appliances. This secret key is supplied during the OVA deployment of the collector, so you will need this information before you deploy any collectors. Once the collector comes online, the “Not yet detected” will go away, and you can ‘Finish’ the deployment.
Adding Data Sources
Once logged in as administrator, it is simply a matter of adding your data sources. In my case, the data sources that I added were my vCenter server, NSX-T Manager, PKS, Kubernetes cluster and my physical switch. Now, in the case of both the vCenter Server and NSX-T data source, you are prompted to enable IPFIX. IPFIX is the Internet Protocol Flow Information Export, and provides much richer information about what is going on from a networking perspective. In the case of NSX-T, you are asked to enable the DFW IPFIX. (DFW is the NSX-T Distributed Firewall). Flows are exported by DFW module installed on hosts.
Note however that this feature can only be enabled for vCenter if you are using vSphere Distributed Switches (DVS) – it is not available if you are using Standard vSwitches (VSS). Therefore you will need to migrate from VSS to DVS to get this flow information if you are currently utilizing Standard vSwitches. VSS doesn’t have IPFIX option so DVS requirement is only for the vCenter data source.
Here is an example of a configured Data Source, in this case, my NSX-T manager:
And just for completeness, here is my full list of configured data sources in my vRNI deployment.
[Update] For the situation where Kubernetes master and workers nodes are deployed as VMs on vSphere, and are using a VM portgroup rather than an NSX-T overlay/vlan, you need to enable IPFIX on DVS to get visibility of VM/Node management network traffic (flow). Kubernetes Pod flow visibility will come from NSX-T DFW irrespective of DVS IPFIX state.
A first look at some of the dashboards
With vRNI now gathering networking data, I was able to get excellent insight into the networking configuration of my infrastructure. First, a quick example of my NSX-T deployment. This is the view from the Entities > VMware NSX Manager:
As you can see, a lot of detailed information about the T0 and T1 routers, as well as flow information and events. Of course, the thing which I was most interested in was my Kubernetes environment. Let’s look at that next. Here is the default Entities > Kubernetes dashboard view:
If I look at the “New K8s Pods discovered (in last 24 hours)” metrics, and I click on the nfs namespace, you get even further details about what is happening in that namespace (not an awful lot in my case, as I only just deployed it in the past hour or so). However, hopefully this gives you an idea how powerful vRNI can be when it comes to monitoring your K8s networking.
I’m not going to add much more to this post. I’m still learning some of the cool functionality that appears to be available in vRNI for K8s networking. If you want to learn more, I’d suggest following the blog series that Matt is creating on the topic. Here is a link to the first post in the series once again. I’ll certainly be following along, trying to learn more about how to get the most out of vRNI for PKS and Kubernetes.