Validating Kubernetes cluster conformance with Sonobuoy

Another product added to the VMware portfolio with the acquisition of Heptio is Sonobuoy. In a nutshell, Sonobuoy will validate the state of your Kubernetes cluster by running a suite of non-destructive tests against your cluster. As part of the end-to-end (e2e) tests that are run by Sonobuoy, there is a also a subset of conformance tests run as well. These include things like best practices and interoperability tests. This will ensure that your Kubernetes cluster (whether is an upstream version or a third-party packaged version) supports all of the necessary Kubernetes APIs. You can read more about conformance here.

To make things slightly more complicated, I decided to implement the air gap solution, since my K8s nodes did not have access to the internet. Thus, I needed to pull down all of the images required for Sonobuoy, and then push them to my internal Harbor repository. Once that step was completed, I then built manifest files which would direct Sonobuoy to pull the necessary images from my Harbor repo and run the test suite. Fortunately Sonobuoy includes a set of steps to make that whole setup fairly seamless. This set of steps details how I deployed and ran Sonobuoy in an air-gap configuration.

1. Initial install

Simply download Sonobuoy, or deploy it by running the go get command outlined here.

 

2. Pull the test images from external repos

I estimate that you will require around 8GB of disk space to pull down all of the necessary test suite images. The following command will pull down the images for you.

$ sudo sonobuoy images pull
INFO[0000] Pulling image: gcr.io/google-samples/gb-redisslave:v3 ...
INFO[0005] Pulling image: gcr.io/kubernetes-e2e-test-images/ipc-utils:1.0 ...
INFO[0008] Pulling image: gcr.io/kubernetes-e2e-test-images/fakegitserver:1.0 ...
INFO[0010] Pulling image: gcr.io/kubernetes-e2e-test-images/echoserver:2.2 ...
INFO[0014] Pulling image: gcr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.0 ...
.
.
.
INFO[0243] Pulling image: gcr.io/kubernetes-e2e-test-images/liveness:1.1 ...
INFO[0245] Pulling image: docker.io/library/nginx:1.15-alpine ...
INFO[0248] Pulling image: gcr.io/kubernetes-e2e-test-images/resource-consumer/controller:1.0 ...
INFO[0250] Pulling image: docker.io/library/busybox:1.29 ...

 

3. Create ‘mapping’ file for test images repo location

In order to push the test images to a local or internal repo such as Harbor, we need to create a manifest YAML file which will be used to indiciate where Sunobuoy can find the test images. I called the file custom-repos.yaml.

$ cat custom-repos.yaml
dockerLibraryRegistry: harbor.rainpole.com/library
e2eRegistry: harbor.rainpole.com/library/kubernetes-e2e-test-images
gcRegistry: harbor.rainpole.com/library
etcdRegistry: harbor.rainpole.com/library/coreos
privateRegistry: harbor.rainpole.com/library/k8s-authenticated-test
sampleRegistry: harbor.rainpole.com/library/google-samples

 

4. Push the test images to an internal repo

We will use the same manifest file to push the images to our local Harbor repo.

$ sudo sonobuoy images push --e2e-repo-config custom-repos.yaml
INFO[0000] Tagging image: gcr.io/kubernetes-e2e-test-images/cuda-vector-add:2.0 as harbor.rainpole.com/library/kubernetes-e2e-test-images/cuda-vector-add:2.0 ...
INFO[0000] Pushing image: harbor.rainpole.com/library/kubernetes-e2e-test-images/cuda-vector-add:2.0 ...
INFO[0000] Tagging image: gcr.io/kubernetes-e2e-test-images/netexec:1.1 as harbor.rainpole.com/library/kubernetes-e2e-test-images/netexec:1.1 ...
.
.
.
INFO[0219] Tagging image: gcr.io/kubernetes-e2e-test-images/nettest:1.0 as harbor.rainpole.com/library/kubernetes-e2e-test-images/nettest:1.0 ...
INFO[0219] Pushing image: harbor.rainpole.com/library/kubernetes-e2e-test-images/nettest:1.0 ...
$

At this point, all of the necessary test suite images are held locally in my Harbor repo.

 

5. Build the Sonobuoy manifest files

At this point, we need to build a bespoke set of Sonobuoy manifest files which will contain references to our local Harbor repo. This will redirect Sonobuoy to use local images rather than pulling them from external repos.

$ sonobuoy gen --e2e-repo-config custom-repos.yaml > config.yaml

$ ls -ltr
total 28
-rw-rw-r-- 1 cormac cormac  335 Jul 23 10:52 custom-repos.yaml
-rw-rw-r-- 1 cormac cormac 5917 Jul 23 12:19 config.yaml

And on examination, you will notice that the config.yaml contains a special ConfigMap kind that has the list of repositories matching what we created earlier, i.e local Harbor repo:

---
apiVersion: v1
data:
  repo-list.yaml: |
    dockerLibraryRegistry: harbor.rainpole.com/library
    e2eRegistry: harbor.rainpole.com/library/kubernetes-e2e-test-images
    gcRegistry: harbor.rainpole.com/library
    etcdRegistry: harbor.rainpole.com/library/coreos
    privateRegistry: harbor.rainpole.com/library/k8s-authenticated-test
    sampleRegistry: harbor.rainpole.com/library/google-samples
kind: ConfigMap
metadata:
  name: repolist-cm
  namespace: heptio-sonobuoy
---

 

6. Placing the Sonobuoy components into the local repo

At this point, all of the test suite components that Sonobuoy needs to reference are in the local Harbor repo, so there is no need to go looking for those externally. However, the main Sonobuoy components (Sonobuoy, conformance and systemd-logs plugin) are still fetching their images from an external repo in the newly created config.yaml manifest.

 image: gcr.io/google-containers/conformance:v1.14.3
 image: gcr.io/heptio-images/sonobuoy-plugin-systemd-logs:latest
 image: gcr.io/heptio-images/sonobuoy:v0.15.0

You will now need to pull these images manually, tag them with the local Harbor repo, and push them to the local Harbor repo. For an example on how to do this, please take a look at this Harbor in action post. Finally, modify the manifest to ensure that these images are sourced from the local Harbor repo. Once this has been completed, everything should now be sourced from the local Harbor repo, and your config.yaml image references should look something like this.

 image: harbor.rainpole.com/library/conformance:v1.14.3
 image: harbor.rainpole.com/library/sonobuoy-plugin-systemd-logs:latest
 image: harbor.rainpole.com/library/sonobuoy:v0.15.0

 

7. Verify supported versions

According to the documentation, Sonobuoy supports 3 Kubernetes minor versions: the current release and 2 minor versions before. It is worth ensuring that you have a supported setup before going any further. First, check the K8s versions as follows:

$ kubectl get nodes -o wide
NAME            STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
csi-master-01   Ready    master   39d   v1.14.3   10.27.51.49   10.27.51.49   Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.6.2
csi-worker-02   Ready    <none>   39d   v1.14.3   10.27.51.61   10.27.51.61   Ubuntu 16.04.6 LTS   4.4.0-150-generic   docker://18.6.2

You can also use Sonobuoy to check versions. If you include the path to the kubeconfig file, it will also check the API version. Looks like we are good to go with our versions.

$ sonobuoy version --kubeconfig ~/.kube/config
Sonobuoy Version: v0.15.0
MinimumKubeVersion: 1.13.0
MaximumKubeVersion: 1.15.99
GitSHA:
API Version:  v1.14.3

 
8. Begin testing

With everything in place, we can now begin the testing. However, since we have had to make changes to the configurations to support the air gap/local repo mechanism, we cannot use the sonobuoy command directly. Instead, we have to use kubectl apply on the manifest built and modified earlier to start the testing.

$ kubectl apply -f config.yaml
namespace/heptio-sonobuoy created
serviceaccount/sonobuoy-serviceaccount created
clusterrolebinding.rbac.authorization.k8s.io/sonobuoy-serviceaccount-heptio-sonobuoy created
clusterrole.rbac.authorization.k8s.io/sonobuoy-serviceaccount created
configmap/sonobuoy-config-cm created
configmap/sonobuoy-plugins-cm created
pod/sonobuoy created
configmap/repolist-cm created
service/sonobuoy-master created

 
9. Monitor tests

By default, two sets of tests are run; the systemd-logs tests and the e2e tests. You can check their progress with the following status command:

$ sonobuoy status
PLUGIN          STATUS  COUNT
e2e             running 1
systemd-logs    running 2

Sonobuoy is still running. Runs can take up to 60 minutes.

$ sonobuoy status
PLUGIN          STATUS          COUNT
e2e             running         1
systemd-logs    complete        2

Sonobuoy is still running. Runs can take up to 60 minutes.

The other useful option for monitoring progress is the ‘logs‘ option. This provides very verbose output, and can be followed if the -f option is used.Here is the output from the very beginning of the tests, showing how many tests will be run:

$ sonobuoy logs -f
namespace="heptio-sonobuoy" pod="sonobuoy-systemd-logs-daemon-set-02274f1712d5490e-bnmkv" container="sonobuoy-worker"
time="2019-07-23T16:17:20Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done
time="2019-07-23T16:17:21Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/systemd_logs
namespace="heptio-sonobuoy" pod="sonobuoy-systemd-logs-daemon-set-02274f1712d5490e-lg2cc" container="sonobuoy-worker"
time="2019-07-23T16:17:22Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done
time="2019-07-23T16:17:23Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/systemd_logs
namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="sonobuoy-worker"
time="2019-07-23T16:17:23Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done
namespace="heptio-sonobuoy" pod="sonobuoy" container="kube-sonobuoy"
time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)"
time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)"
time="2019-07-23T16:17:21Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist"
time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)"
time="2019-07-23T16:17:21Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist"
time="2019-07-23T16:17:21Z" level=info msg="Filtering namespaces based on the following regex:.*|heptio-sonobuoy"
time="2019-07-23T16:17:21Z" level=info msg="Namespace csi-demo Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace default Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace demo Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace heptio-sonobuoy Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-node-lease Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-public Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-system Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace svc-demo Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Namespace velero Matched=true"
time="2019-07-23T16:17:21Z" level=info msg="Starting server Expected Results: [{ e2e} {csi-master-01 systemd-logs} {csi-worker-02 systemd-logs}]"
time="2019-07-23T16:17:21Z" level=info msg="Starting annotation update routine"
time="2019-07-23T16:17:21Z" level=info msg="Starting aggregation server" address=0.0.0.0 port=8080
time="2019-07-23T16:17:21Z" level=info msg="Running plugin" plugin=e2e
time="2019-07-23T16:17:21Z" level=info msg="Running plugin" plugin=systemd-logs
time="2019-07-23T16:17:23Z" level=info msg="received aggregator request" client_cert=systemd-logs node=csi-worker-02 plugin_name=systemd-logs
time="2019-07-23T16:17:24Z" level=info msg="received aggregator request" client_cert=systemd-logs node=csi-master-01 plugin_name=systemd-logs
namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="e2e"
+ set +x
+ /usr/local/bin/ginkgo '--focus=\[Conformance\]' '--skip=Alpha|\[(Disruptive|Feature:[^\]]+|Flaky)\]' --noColor=true /usr/local/bin/e2e.test -- --disable-log-dump --repo-root=/kubernetes --provider=local --report-dir=/tmp/results --kubeconfig=
+ tee /tmp/results/e2e.log
I0723 16:17:23.756655      15 test_context.go:405] Using a temporary kubeconfig file from in-cluster config : /tmp/kubeconfig-709992315
I0723 16:17:23.756820      15 e2e.go:240] Starting e2e run "5a22a21f-ad65-11e9-bb39-9288ee65ecf1" on Ginkgo node 1
Running Suite: Kubernetes e2e suite
===================================
Random Seed: 1563898642 - Will randomize all specs
Will run 204 of 3585 specs

Here is an output from the very end of a suite of tests, showing how many tests were run and which ones passed and which ones (if any) failed.

$ sonobuoy logs -f
• [SLOW TEST:8.240 seconds]
[sig-storage] Projected downwardAPI
/workspace/anago-v1.14.3-beta.0.37+5e53fd6bc17c0d/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/projected_downwardapi.go:33
  should provide container's cpu limit [NodeConformance] [Conformance]
  /workspace/anago-v1.14.3-beta.0.37+5e53fd6bc17c0d/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:692
------------------------------
SSSSSSSSSSSJul 23 17:40:44.454: INFO: Running AfterSuite actions on all nodes
Jul 23 17:40:44.454: INFO: Running AfterSuite actions on node 1
Jul 23 17:40:44.454: INFO: Skipping dumping logs from cluster

Ran 204 of 3585 Specs in 5000.512 seconds
SUCCESS! -- 204 Passed | 0 Failed | 0 Pending | 3381 Skipped PASS

Ginkgo ran 1 suite in 1h23m21.836284415s
Test Suite Passed

namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="sonobuoy-worker"
time="2019-07-23T17:40:45Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/e2e.tar.gz
namespace="heptio-sonobuoy" pod="sonobuoy" container="kube-sonobuoy"
time="2019-07-23T17:40:45Z" level=info msg="received aggregator request" client_cert=e2e plugin_name=e2e
time="2019-07-23T17:40:45Z" level=info msg="Last update to annotations on exit"
time="2019-07-23T17:40:45Z" level=info msg="csidrivers not specified in non-nil Resources. Skipping csidrivers query."
time="2019-07-23T17:40:45Z" level=info msg="csinodes not specified in non-nil Resources. Skipping csinodes query."
time="2019-07-23T17:40:45Z" level=info msg="runtimeclasses not specified in non-nil Resources. Skipping runtimeclasses query."
time="2019-07-23T17:40:45Z" level=info msg="secrets not specified in non-nil Resources. Skipping secrets query."
time="2019-07-23T17:40:45Z" level=info msg="events not specified in non-nil Resources. Skipping events query."
time="2019-07-23T17:40:45Z" level=info msg="horizontalpodautoscalers not specified in non-nil Resources. Skipping horizontalpodautoscalers query."
time="2019-07-23T17:40:45Z" level=info msg="resticrepositories not specified in non-nil Resources. Skipping resticrepositories query."
time="2019-07-23T17:40:45Z" level=info msg="backupstoragelocations not specified in non-nil Resources. Skipping backupstoragelocations query."
time="2019-07-23T17:40:45Z" level=info msg="backups not specified in non-nil Resources. Skipping backups query."
time="2019-07-23T17:40:45Z" level=info msg="downloadrequests not specified in non-nil Resources. Skipping downloadrequests query."
time="2019-07-23T17:40:45Z" level=info msg="podvolumerestores not specified in non-nil Resources. Skipping podvolumerestores query."
time="2019-07-23T17:40:45Z" level=info msg="restores not specified in non-nil Resources. Skipping restores query."
time="2019-07-23T17:40:45Z" level=info msg="deletebackuprequests not specified in non-nil Resources. Skipping deletebackuprequests query."
time="2019-07-23T17:40:45Z" level=info msg="podvolumebackups not specified in non-nil Resources. Skipping podvolumebackups query."
time="2019-07-23T17:40:45Z" level=info msg="serverstatusrequests not specified in non-nil Resources. Skipping serverstatusrequests query."
time="2019-07-23T17:40:45Z" level=info msg="volumesnapshotlocations not specified in non-nil Resources. Skipping volumesnapshotlocations query."
time="2019-07-23T17:40:45Z" level=info msg="schedules not specified in non-nil Resources. Skipping schedules query."
time="2019-07-23T17:40:45Z" level=info msg="Collecting Node Configuration and Health..."
time="2019-07-23T17:40:45Z" level=info msg="Creating host results for csi-master-01 under /tmp/sonobuoy/17fc51fa-6ba7-4f14-bf9d-47dc22e702bc/hosts/csi-master-01\n"
time="2019-07-23T17:40:45Z" level=info msg="Creating host results for csi-worker-02 under /tmp/sonobuoy/17fc51fa-6ba7-4f14-bf9d-47dc22e702bc/hosts/csi-worker-02\n"
time="2019-07-23T17:40:45Z" level=info msg="Running cluster queries"
time="2019-07-23T17:40:45Z" level=info msg="Running ns query (csi-demo)"
time="2019-07-23T17:40:45Z" level=info msg="Collecting Pod Logs (csi-demo)"
time="2019-07-23T17:40:45Z" level=info msg="Running ns query (default)"
time="2019-07-23T17:40:45Z" level=info msg="Collecting Pod Logs (default)"
time="2019-07-23T17:40:45Z" level=info msg="Running ns query (demo)"
time="2019-07-23T17:40:46Z" level=info msg="Collecting Pod Logs (demo)"
time="2019-07-23T17:40:46Z" level=info msg="Running ns query (heptio-sonobuoy)"
time="2019-07-23T17:40:47Z" level=info msg="Collecting Pod Logs (heptio-sonobuoy)"
time="2019-07-23T17:40:47Z" level=info msg="Running ns query (kube-node-lease)"
time="2019-07-23T17:40:47Z" level=info msg="Collecting Pod Logs (kube-node-lease)"
time="2019-07-23T17:40:47Z" level=info msg="Running ns query (kube-public)"
time="2019-07-23T17:40:48Z" level=info msg="Collecting Pod Logs (kube-public)"
time="2019-07-23T17:40:48Z" level=info msg="Running ns query (kube-system)"
time="2019-07-23T17:40:49Z" level=info msg="Collecting Pod Logs (kube-system)"
time="2019-07-23T17:40:57Z" level=info msg="Running ns query (svc-demo)"
time="2019-07-23T17:40:57Z" level=info msg="Collecting Pod Logs (svc-demo)"
time="2019-07-23T17:40:57Z" level=info msg="Running ns query (velero)"
time="2019-07-23T17:40:57Z" level=info msg="Collecting Pod Logs (velero)"
time="2019-07-23T17:40:59Z" level=info msg="Results available at /tmp/sonobuoy/201907231617_sonobuoy_17fc51fa-6ba7-4f14-bf9d-47dc22e702bc.tar.gz"
time="2019-07-23T17:40:59Z" level=info msg="no-exit was specified, sonobuoy is now blocking"

 
10. Querying Results

After completing the test suite, the results can also be queried using a variety of commands. First, verify that every thing has completed using the status command.

$ sonobuoy status
PLUGIN          STATUS          COUNT
e2e             complete        1
systemd-logs    complete        2

Sonobuoy has completed. Use `sonobuoy retrieve` to get results.

As the previous command suggested, we can now retrieve the results of the run, and examine them for failures. Here is a way to do just that.

$ results=$(sonobuoy retrieve)
$ sonobuoy e2e $results
failed tests: 0

 
11. Clean Up

To remove Sonobuoy from your cluster, simply run the following command:

$ sonobuoy delete --wait
INFO[0000] deleted kind=namespace namespace=heptio-sonobuoy
INFO[0000] deleted kind=clusterrolebindings
INFO[0000] deleted kind=clusterroles
$

And that completes our introduction to Sonobuoy. Hopefully you can see how useful this product can be when wishing to verify the integrity of your Kubernetes cluster. Read more about Sonobuoy here. One other nice feature is the ability to write your own custom plugins to extend the test suite. This could be very useful if you wanted to validate some bespoke applications or features, alongside the end-to-end conformance that Sonobuoy already tests.

3 Replies to “Validating Kubernetes cluster conformance with Sonobuoy”

  1. Great writeup; I love hearing about people’s experience with Sonobuoy.

    I wanted to just point out an option in case you wanted to use it in your future workflows.

    If you don’t really _need_ the sytemd logs plugin to run then you can get away from having to edit the yaml and using kubectl directly.

    Instead you can set the `sonobuoy-image` and `kube-conformance-image` flags from the `sonobuoy` cli to point to your Harbor repo and run only the e2e plugin by adding the flag `–plugin e2e`. This would allow you to utilize the `–wait` flag to run it synchronously if you want. The systemd logs plugin image is the only one that can’t (easily) be edited.

    You’d still be able to check the status/logs from other terminals, but I find the `–wait` flag preferable to only pinging the status for results.

    Thanks again for the read.

    1. Thanks for the guidance John. That also works. Since all my images are in my harbor.rainpole.com repo, my command is something like this (for other readers):


      $ sonobuoy run --sonobuoy-image "harbor.rainpole.com/library/sonobuoy:v0.15.0" --kube-conformance-image "harbor.rainpole.com/library/conformance:v1.14.3" --plugin e2e --wait
      INFO[0000] created object name=heptio-sonobuoy namespace= resource=namespaces
      INFO[0000] created object name=sonobuoy-serviceaccount namespace=heptio-sonobuoy resource=serviceaccounts
      INFO[0000] created object name=sonobuoy-serviceaccount-heptio-sonobuoy namespace= resource=clusterrolebindings
      INFO[0000] created object name=sonobuoy-serviceaccount namespace= resource=clusterroles
      INFO[0000] created object name=sonobuoy-config-cm namespace=heptio-sonobuoy resource=configmaps
      INFO[0000] created object name=sonobuoy-plugins-cm namespace=heptio-sonobuoy resource=configmaps
      INFO[0000] created object name=sonobuoy namespace=heptio-sonobuoy resource=pods
      INFO[0000] created object name=sonobuoy-master namespace=heptio-sonobuoy resource=services

      And I can query the status from another terminal:


      $ sonobuoy status
      PLUGIN STATUS COUNT
      e2e running 1

      Sonobuoy is still running. Runs can take up to 60 minutes.

      Definitely easier than editing the YAML files. Good info!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.