Using a Kubernetes Operator to query vSphere Resources

As many regular readers will be aware, I’ve spent a bit of time in the past looking at how vSphere resources are consumed by Kubernetes objects, when Kubernetes is deployed as a set of virtual machines on top of vSphere infrastructure. While much of this is visible in the vSphere client, I’m focused on how to see this vSphere resource consumption from within Kubernetes. If I am working in Kubernetes, I’d rather not context switch out to the vSphere client just to see how much storage is left on a datastore or how much CPU and Memory is left on an ESXi host.

Some time back, I started work on vTopology, which allows me to plugin a Shell/PowerShell script into a mechanism called krew and run it from kubectl to get some information. However it is a little cumbersome to get all the pieces in place. So I began looking at alternate ways in which I could achieve the same thing without requiring any external dependencies. Kubernetes Customer Resource Definitions (CRDs) and Operators seem to be universally recognized as the de-facto way to extend Kubernetes. Thus, I started to look at how I might be able to create a CRD and operator to query for vSphere objects such as HostInfo, or DiskInfo, or VMInfo and use these to query underlying vSphere resources from kubectl.

As a proof-of-concept, I built a very simple CRD and Operator which returns the TotalCPU and FreeCPU of an ESXi host. I learnt so much from trying to do this exercise that I decided to write up the steps on GitHub and share them with you. If you are looking to learn more about Kubernetes CRD and Operators, and are interested in how to get it to interact with VMware’s govmomi APIs provided by VMware for vSphere, you might like to check it out. It is a long way off from providing all of the detail of the underlying infrastructure which I currently have in vTopology today, but maybe over time I’ll be able to add more features.

The complete code for the operator and CRD can be found here on my GitHub repository: https://github.com/cormachogan/hostinfo-operator, along with step by step instructions on how to deploy it on your own Kubernetes cluster. Hope you find it useful. This was updated [18th Jan 2021] to move the vSphere login out of the Reconciler code and into main.go to avoid calling vSphere on every reconcile. Here is a sample output which contains CPU usage information in the status:

$ kubectl get hi -o yaml
apiVersion: v1
items:
- apiVersion: topology.corinternal.com/v1
  kind: HostInfo
  metadata:
    creationTimestamp: "2021-01-18T14:15:07Z"
    generation: 1
    managedFields:
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:hostname: {}
      manager: kubectl
      operation: Update
      time: "2021-01-18T14:15:07Z"
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:freeCPU: {}
          f:totalCPU: {}
      manager: manager
      operation: Update
      time: "2021-01-18T14:31:00Z"
    name: hostinfo-host-e
    namespace: default
    resourceVersion: "28883011"
    selfLink: /apis/topology.corinternal.com/v1/namespaces/default/hostinfoes/hostinfo-host-e
    uid: 720a91bb-8929-4120-8ba9-d652c884f9ed
  spec:
    hostname: esxi-dell-e.rainpole.com
  status:
    freeCPU: 41238
    totalCPU: 43980
kind: List
metadata:
  resourceVersion: ""

I also created an operator to retrieve virtual machine information. You also find it on GitHub, here: https://github.com/cormachogan/vminfo-operator. Again, you can see the sorts of VM information that we can pull via the operator in the status fields.

$ kubectl get vminfo -o yaml
apiVersion: v1
items:
- apiVersion: topology.corinternal.com/v1
  kind: VMInfo
  metadata:
    creationTimestamp: "2021-01-18T12:20:45Z"
    generation: 1
    managedFields:
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          .: {}
          f:nodename: {}
      manager: kubectl
      operation: Update
      time: "2021-01-18T12:20:45Z"
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:guestId: {}
          f:hwVersion: {}
          f:ipAddress: {}
          f:pathToVM: {}
          f:powerState: {}
          f:resvdCPU: {}
          f:resvdMem: {}
          f:totalCPU: {}
          f:totalMem: {}
      manager: manager
      operation: Update
      time: "2021-01-18T12:20:46Z"
    name: tkg-worker-1
    namespace: default
    resourceVersion: "28841720"
    selfLink: /apis/topology.corinternal.com/v1/namespaces/default/vminfoes/tkg-worker-1
    uid: 2c60b273-a866-4344-baf5-0b3b924b65a5
  spec:
    nodename: tkg-cluster-1-18-5b-workers-kc5xn-dd68c4685-5v298
  status:
    guestId: vmwarePhoton64Guest
    hwVersion: vmx-17
    ipAddress: 10.27.62.45
    pathToVM: '[vsanDatastore] 4d56b55f-11db-8822-6463-246e962f4914/tkg-cluster-1-18-5b-workers-kc5xn-dd68c4685-5v298.vmx'
    powerState: poweredOn
    resvdCPU: 0
    resvdMem: 0
    totalCPU: 2
    totalMem: 4096
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

My final exercise was to create a tutorial on how to get FCD information. FCDs, short for First Class Disks, are used to back Kubernetes Persistent Volumes when these are deployed on vSphere Storage using the vSphere CSI driver. The operator is here: https://github.com/cormachogan/fcdinfo-operator. Here is some of the information we can get for the PV / FCD, such as the path to the file, and the provisioning type (thick/thin):

$ kubectl get fcd -o yaml
apiVersion: v1
items:
- apiVersion: topology.corinternal.com/v1
  kind: FCDInfo
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"topology.corinternal.com/v1","kind":"FCDInfo","metadata":{"annotations":{},"name":"fcdinfo-sample","namespace":"default"},"spec":{"pvId":"pvc-e3f6dd59-cbc0-49a7-97c8-d92a26732c43"}}
    creationTimestamp: "2021-01-26T10:43:21Z"
    generation: 1
    managedFields:
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:spec:
          .: {}
          f:pvId: {}
      manager: kubectl
      operation: Update
      time: "2021-01-26T10:43:21Z"
    - apiVersion: topology.corinternal.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:filePath: {}
          f:provisioningType: {}
          f:sizeMB: {}
      manager: manager
      operation: Update
      time: "2021-01-26T10:43:22Z"
    name: fcdinfo-sample
    namespace: default
    resourceVersion: "32818807"
    selfLink: /apis/topology.corinternal.com/v1/namespaces/default/fcdinfoes/fcdinfo-sample
    uid: 5d51788d-fc1b-441f-be11-723d02c87b4b
  spec:
    pvId: pvc-e3f6dd59-cbc0-49a7-97c8-d92a26732c43
  status:
    filePath: '[vsanDatastore] 038f6b5f-8122-d3af-eabe-246e962c240c/b39bcacc6ff143439f9cd6b7454999e4.vmdk'
    provisioningType: thin
    sizeMB: 1024
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

I learnt loads from building these operators. I hope you find the tutorials useful, both from an operator and a govmomi persepctive.