Validating Kubernetes cluster conformance with Sonobuoy
Another product added to the VMware portfolio with the acquisition of Heptio is Sonobuoy. In a nutshell, Sonobuoy will validate the state of your Kubernetes cluster by running a suite of non-destructive tests against your cluster. As part of the end-to-end (e2e) tests that are run by Sonobuoy, there is a also a subset of conformance tests run as well. These include things like best practices and interoperability tests. This will ensure that your Kubernetes cluster (whether is an upstream version or a third-party packaged version) supports all of the necessary Kubernetes APIs. You can read more about conformance here.
To make things slightly more complicated, I decided to implement the air gap solution, since my K8s nodes did not have access to the internet. Thus, I needed to pull down all of the images required for Sonobuoy, and then push them to my internal Harbor repository. Once that step was completed, I then built manifest files which would direct Sonobuoy to pull the necessary images from my Harbor repo and run the test suite. Fortunately Sonobuoy includes a set of steps to make that whole setup fairly seamless. This set of steps details how I deployed and ran Sonobuoy in an air-gap configuration.
1. Initial install
Simply download Sonobuoy, or deploy it by running the go get command outlined here.
2. Pull the test images from external repos
I estimate that you will require around 8GB of disk space to pull down all of the necessary test suite images. The following command will pull down the images for you.
$ sudo sonobuoy images pull INFO[0000] Pulling image: gcr.io/google-samples/gb-redisslave:v3 ... INFO[0005] Pulling image: gcr.io/kubernetes-e2e-test-images/ipc-utils:1.0 ... INFO[0008] Pulling image: gcr.io/kubernetes-e2e-test-images/fakegitserver:1.0 ... INFO[0010] Pulling image: gcr.io/kubernetes-e2e-test-images/echoserver:2.2 ... INFO[0014] Pulling image: gcr.io/kubernetes-e2e-test-images/jessie-dnsutils:1.0 ... . . . INFO[0243] Pulling image: gcr.io/kubernetes-e2e-test-images/liveness:1.1 ... INFO[0245] Pulling image: docker.io/library/nginx:1.15-alpine ... INFO[0248] Pulling image: gcr.io/kubernetes-e2e-test-images/resource-consumer/controller:1.0 ... INFO[0250] Pulling image: docker.io/library/busybox:1.29 ...
3. Create ‘mapping’ file for test images repo location
In order to push the test images to a local or internal repo such as Harbor, we need to create a manifest YAML file which will be used to indiciate where Sunobuoy can find the test images. I called the file custom-repos.yaml.
$ cat custom-repos.yaml dockerLibraryRegistry: harbor.rainpole.com/library e2eRegistry: harbor.rainpole.com/library/kubernetes-e2e-test-images gcRegistry: harbor.rainpole.com/library etcdRegistry: harbor.rainpole.com/library/coreos privateRegistry: harbor.rainpole.com/library/k8s-authenticated-test sampleRegistry: harbor.rainpole.com/library/google-samples
4. Push the test images to an internal repo
We will use the same manifest file to push the images to our local Harbor repo.
$ sudo sonobuoy images push --e2e-repo-config custom-repos.yaml INFO[0000] Tagging image: gcr.io/kubernetes-e2e-test-images/cuda-vector-add:2.0 as harbor.rainpole.com/library/kubernetes-e2e-test-images/cuda-vector-add:2.0 ... INFO[0000] Pushing image: harbor.rainpole.com/library/kubernetes-e2e-test-images/cuda-vector-add:2.0 ... INFO[0000] Tagging image: gcr.io/kubernetes-e2e-test-images/netexec:1.1 as harbor.rainpole.com/library/kubernetes-e2e-test-images/netexec:1.1 ... . . . INFO[0219] Tagging image: gcr.io/kubernetes-e2e-test-images/nettest:1.0 as harbor.rainpole.com/library/kubernetes-e2e-test-images/nettest:1.0 ... INFO[0219] Pushing image: harbor.rainpole.com/library/kubernetes-e2e-test-images/nettest:1.0 ... $
At this point, all of the necessary test suite images are held locally in my Harbor repo.
5. Build the Sonobuoy manifest files
At this point, we need to build a bespoke set of Sonobuoy manifest files which will contain references to our local Harbor repo. This will redirect Sonobuoy to use local images rather than pulling them from external repos.
$ sonobuoy gen --e2e-repo-config custom-repos.yaml > config.yaml $ ls -ltr total 28 -rw-rw-r-- 1 cormac cormac 335 Jul 23 10:52 custom-repos.yaml -rw-rw-r-- 1 cormac cormac 5917 Jul 23 12:19 config.yaml
And on examination, you will notice that the config.yaml contains a special ConfigMap kind that has the list of repositories matching what we created earlier, i.e local Harbor repo:
--- apiVersion: v1 data: repo-list.yaml: | dockerLibraryRegistry: harbor.rainpole.com/library e2eRegistry: harbor.rainpole.com/library/kubernetes-e2e-test-images gcRegistry: harbor.rainpole.com/library etcdRegistry: harbor.rainpole.com/library/coreos privateRegistry: harbor.rainpole.com/library/k8s-authenticated-test sampleRegistry: harbor.rainpole.com/library/google-samples kind: ConfigMap metadata: name: repolist-cm namespace: heptio-sonobuoy ---
6. Placing the Sonobuoy components into the local repo
At this point, all of the test suite components that Sonobuoy needs to reference are in the local Harbor repo, so there is no need to go looking for those externally. However, the main Sonobuoy components (Sonobuoy, conformance and systemd-logs plugin) are still fetching their images from an external repo in the newly created config.yaml manifest.
image: gcr.io/google-containers/conformance:v1.14.3 image: gcr.io/heptio-images/sonobuoy-plugin-systemd-logs:latest image: gcr.io/heptio-images/sonobuoy:v0.15.0
You will now need to pull these images manually, tag them with the local Harbor repo, and push them to the local Harbor repo. For an example on how to do this, please take a look at this Harbor in action post. Finally, modify the manifest to ensure that these images are sourced from the local Harbor repo. Once this has been completed, everything should now be sourced from the local Harbor repo, and your config.yaml image references should look something like this.
image: harbor.rainpole.com/library/conformance:v1.14.3 image: harbor.rainpole.com/library/sonobuoy-plugin-systemd-logs:latest image: harbor.rainpole.com/library/sonobuoy:v0.15.0
7. Verify supported versions
According to the documentation, Sonobuoy supports 3 Kubernetes minor versions: the current release and 2 minor versions before. It is worth ensuring that you have a supported setup before going any further. First, check the K8s versions as follows:
$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME csi-master-01 Ready master 39d v1.14.3 10.27.51.49 10.27.51.49 Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.6.2 csi-worker-02 Ready <none> 39d v1.14.3 10.27.51.61 10.27.51.61 Ubuntu 16.04.6 LTS 4.4.0-150-generic docker://18.6.2
You can also use Sonobuoy to check versions. If you include the path to the kubeconfig file, it will also check the API version. Looks like we are good to go with our versions.
$ sonobuoy version --kubeconfig ~/.kube/config Sonobuoy Version: v0.15.0 MinimumKubeVersion: 1.13.0 MaximumKubeVersion: 1.15.99 GitSHA: API Version: v1.14.3
8. Begin testing
With everything in place, we can now begin the testing. However, since we have had to make changes to the configurations to support the air gap/local repo mechanism, we cannot use the sonobuoy command directly. Instead, we have to use kubectl apply on the manifest built and modified earlier to start the testing.
$ kubectl apply -f config.yaml namespace/heptio-sonobuoy created serviceaccount/sonobuoy-serviceaccount created clusterrolebinding.rbac.authorization.k8s.io/sonobuoy-serviceaccount-heptio-sonobuoy created clusterrole.rbac.authorization.k8s.io/sonobuoy-serviceaccount created configmap/sonobuoy-config-cm created configmap/sonobuoy-plugins-cm created pod/sonobuoy created configmap/repolist-cm created service/sonobuoy-master created
9. Monitor tests
By default, two sets of tests are run; the systemd-logs tests and the e2e tests. You can check their progress with the following status command:
$ sonobuoy status PLUGIN STATUS COUNT e2e running 1 systemd-logs running 2 Sonobuoy is still running. Runs can take up to 60 minutes. $ sonobuoy status PLUGIN STATUS COUNT e2e running 1 systemd-logs complete 2 Sonobuoy is still running. Runs can take up to 60 minutes.
The other useful option for monitoring progress is the ‘logs‘ option. This provides very verbose output, and can be followed if the -f option is used.Here is the output from the very beginning of the tests, showing how many tests will be run:
$ sonobuoy logs -f namespace="heptio-sonobuoy" pod="sonobuoy-systemd-logs-daemon-set-02274f1712d5490e-bnmkv" container="sonobuoy-worker" time="2019-07-23T16:17:20Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done time="2019-07-23T16:17:21Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/systemd_logs namespace="heptio-sonobuoy" pod="sonobuoy-systemd-logs-daemon-set-02274f1712d5490e-lg2cc" container="sonobuoy-worker" time="2019-07-23T16:17:22Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done time="2019-07-23T16:17:23Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/systemd_logs namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="sonobuoy-worker" time="2019-07-23T16:17:23Z" level=info msg="Waiting for waitfile" waitfile=/tmp/results/done namespace="heptio-sonobuoy" pod="sonobuoy" container="kube-sonobuoy" time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in ./plugins.d (pwd: /)" time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in /etc/sonobuoy/plugins.d (pwd: /)" time="2019-07-23T16:17:21Z" level=info msg="Directory (/etc/sonobuoy/plugins.d) does not exist" time="2019-07-23T16:17:21Z" level=info msg="Scanning plugins in ~/sonobuoy/plugins.d (pwd: /)" time="2019-07-23T16:17:21Z" level=info msg="Directory (~/sonobuoy/plugins.d) does not exist" time="2019-07-23T16:17:21Z" level=info msg="Filtering namespaces based on the following regex:.*|heptio-sonobuoy" time="2019-07-23T16:17:21Z" level=info msg="Namespace csi-demo Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace default Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace demo Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace heptio-sonobuoy Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-node-lease Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-public Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace kube-system Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace svc-demo Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Namespace velero Matched=true" time="2019-07-23T16:17:21Z" level=info msg="Starting server Expected Results: [{ e2e} {csi-master-01 systemd-logs} {csi-worker-02 systemd-logs}]" time="2019-07-23T16:17:21Z" level=info msg="Starting annotation update routine" time="2019-07-23T16:17:21Z" level=info msg="Starting aggregation server" address=0.0.0.0 port=8080 time="2019-07-23T16:17:21Z" level=info msg="Running plugin" plugin=e2e time="2019-07-23T16:17:21Z" level=info msg="Running plugin" plugin=systemd-logs time="2019-07-23T16:17:23Z" level=info msg="received aggregator request" client_cert=systemd-logs node=csi-worker-02 plugin_name=systemd-logs time="2019-07-23T16:17:24Z" level=info msg="received aggregator request" client_cert=systemd-logs node=csi-master-01 plugin_name=systemd-logs namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="e2e" + set +x + /usr/local/bin/ginkgo '--focus=\[Conformance\]' '--skip=Alpha|\[(Disruptive|Feature:[^\]]+|Flaky)\]' --noColor=true /usr/local/bin/e2e.test -- --disable-log-dump --repo-root=/kubernetes --provider=local --report-dir=/tmp/results --kubeconfig= + tee /tmp/results/e2e.log I0723 16:17:23.756655 15 test_context.go:405] Using a temporary kubeconfig file from in-cluster config : /tmp/kubeconfig-709992315 I0723 16:17:23.756820 15 e2e.go:240] Starting e2e run "5a22a21f-ad65-11e9-bb39-9288ee65ecf1" on Ginkgo node 1 Running Suite: Kubernetes e2e suite =================================== Random Seed: 1563898642 - Will randomize all specs Will run 204 of 3585 specs
Here is an output from the very end of a suite of tests, showing how many tests were run and which ones passed and which ones (if any) failed.
$ sonobuoy logs -f • [SLOW TEST:8.240 seconds] [sig-storage] Projected downwardAPI /workspace/anago-v1.14.3-beta.0.37+5e53fd6bc17c0d/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/common/projected_downwardapi.go:33 should provide container's cpu limit [NodeConformance] [Conformance] /workspace/anago-v1.14.3-beta.0.37+5e53fd6bc17c0d/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:692 ------------------------------ SSSSSSSSSSSJul 23 17:40:44.454: INFO: Running AfterSuite actions on all nodes Jul 23 17:40:44.454: INFO: Running AfterSuite actions on node 1 Jul 23 17:40:44.454: INFO: Skipping dumping logs from cluster Ran 204 of 3585 Specs in 5000.512 seconds SUCCESS! -- 204 Passed | 0 Failed | 0 Pending | 3381 Skipped PASS Ginkgo ran 1 suite in 1h23m21.836284415s Test Suite Passed namespace="heptio-sonobuoy" pod="sonobuoy-e2e-job-53542f69c61f4dae" container="sonobuoy-worker" time="2019-07-23T17:40:45Z" level=info msg="Detected done file, transmitting result file" resultFile=/tmp/results/e2e.tar.gz namespace="heptio-sonobuoy" pod="sonobuoy" container="kube-sonobuoy" time="2019-07-23T17:40:45Z" level=info msg="received aggregator request" client_cert=e2e plugin_name=e2e time="2019-07-23T17:40:45Z" level=info msg="Last update to annotations on exit" time="2019-07-23T17:40:45Z" level=info msg="csidrivers not specified in non-nil Resources. Skipping csidrivers query." time="2019-07-23T17:40:45Z" level=info msg="csinodes not specified in non-nil Resources. Skipping csinodes query." time="2019-07-23T17:40:45Z" level=info msg="runtimeclasses not specified in non-nil Resources. Skipping runtimeclasses query." time="2019-07-23T17:40:45Z" level=info msg="secrets not specified in non-nil Resources. Skipping secrets query." time="2019-07-23T17:40:45Z" level=info msg="events not specified in non-nil Resources. Skipping events query." time="2019-07-23T17:40:45Z" level=info msg="horizontalpodautoscalers not specified in non-nil Resources. Skipping horizontalpodautoscalers query." time="2019-07-23T17:40:45Z" level=info msg="resticrepositories not specified in non-nil Resources. Skipping resticrepositories query." time="2019-07-23T17:40:45Z" level=info msg="backupstoragelocations not specified in non-nil Resources. Skipping backupstoragelocations query." time="2019-07-23T17:40:45Z" level=info msg="backups not specified in non-nil Resources. Skipping backups query." time="2019-07-23T17:40:45Z" level=info msg="downloadrequests not specified in non-nil Resources. Skipping downloadrequests query." time="2019-07-23T17:40:45Z" level=info msg="podvolumerestores not specified in non-nil Resources. Skipping podvolumerestores query." time="2019-07-23T17:40:45Z" level=info msg="restores not specified in non-nil Resources. Skipping restores query." time="2019-07-23T17:40:45Z" level=info msg="deletebackuprequests not specified in non-nil Resources. Skipping deletebackuprequests query." time="2019-07-23T17:40:45Z" level=info msg="podvolumebackups not specified in non-nil Resources. Skipping podvolumebackups query." time="2019-07-23T17:40:45Z" level=info msg="serverstatusrequests not specified in non-nil Resources. Skipping serverstatusrequests query." time="2019-07-23T17:40:45Z" level=info msg="volumesnapshotlocations not specified in non-nil Resources. Skipping volumesnapshotlocations query." time="2019-07-23T17:40:45Z" level=info msg="schedules not specified in non-nil Resources. Skipping schedules query." time="2019-07-23T17:40:45Z" level=info msg="Collecting Node Configuration and Health..." time="2019-07-23T17:40:45Z" level=info msg="Creating host results for csi-master-01 under /tmp/sonobuoy/17fc51fa-6ba7-4f14-bf9d-47dc22e702bc/hosts/csi-master-01\n" time="2019-07-23T17:40:45Z" level=info msg="Creating host results for csi-worker-02 under /tmp/sonobuoy/17fc51fa-6ba7-4f14-bf9d-47dc22e702bc/hosts/csi-worker-02\n" time="2019-07-23T17:40:45Z" level=info msg="Running cluster queries" time="2019-07-23T17:40:45Z" level=info msg="Running ns query (csi-demo)" time="2019-07-23T17:40:45Z" level=info msg="Collecting Pod Logs (csi-demo)" time="2019-07-23T17:40:45Z" level=info msg="Running ns query (default)" time="2019-07-23T17:40:45Z" level=info msg="Collecting Pod Logs (default)" time="2019-07-23T17:40:45Z" level=info msg="Running ns query (demo)" time="2019-07-23T17:40:46Z" level=info msg="Collecting Pod Logs (demo)" time="2019-07-23T17:40:46Z" level=info msg="Running ns query (heptio-sonobuoy)" time="2019-07-23T17:40:47Z" level=info msg="Collecting Pod Logs (heptio-sonobuoy)" time="2019-07-23T17:40:47Z" level=info msg="Running ns query (kube-node-lease)" time="2019-07-23T17:40:47Z" level=info msg="Collecting Pod Logs (kube-node-lease)" time="2019-07-23T17:40:47Z" level=info msg="Running ns query (kube-public)" time="2019-07-23T17:40:48Z" level=info msg="Collecting Pod Logs (kube-public)" time="2019-07-23T17:40:48Z" level=info msg="Running ns query (kube-system)" time="2019-07-23T17:40:49Z" level=info msg="Collecting Pod Logs (kube-system)" time="2019-07-23T17:40:57Z" level=info msg="Running ns query (svc-demo)" time="2019-07-23T17:40:57Z" level=info msg="Collecting Pod Logs (svc-demo)" time="2019-07-23T17:40:57Z" level=info msg="Running ns query (velero)" time="2019-07-23T17:40:57Z" level=info msg="Collecting Pod Logs (velero)" time="2019-07-23T17:40:59Z" level=info msg="Results available at /tmp/sonobuoy/201907231617_sonobuoy_17fc51fa-6ba7-4f14-bf9d-47dc22e702bc.tar.gz" time="2019-07-23T17:40:59Z" level=info msg="no-exit was specified, sonobuoy is now blocking"
10. Querying Results
After completing the test suite, the results can also be queried using a variety of commands. First, verify that every thing has completed using the status command.
$ sonobuoy status PLUGIN STATUS COUNT e2e complete 1 systemd-logs complete 2 Sonobuoy has completed. Use `sonobuoy retrieve` to get results.
As the previous command suggested, we can now retrieve the results of the run, and examine them for failures. Here is a way to do just that.
$ results=$(sonobuoy retrieve) $ sonobuoy e2e $results failed tests: 0
11. Clean Up
To remove Sonobuoy from your cluster, simply run the following command:
$ sonobuoy delete --wait INFO[0000] deleted kind=namespace namespace=heptio-sonobuoy INFO[0000] deleted kind=clusterrolebindings INFO[0000] deleted kind=clusterroles $
And that completes our introduction to Sonobuoy. Hopefully you can see how useful this product can be when wishing to verify the integrity of your Kubernetes cluster. Read more about Sonobuoy here. One other nice feature is the ability to write your own custom plugins to extend the test suite. This could be very useful if you wanted to validate some bespoke applications or features, alongside the end-to-end conformance that Sonobuoy already tests.
Great writeup; I love hearing about people’s experience with Sonobuoy.
I wanted to just point out an option in case you wanted to use it in your future workflows.
If you don’t really _need_ the sytemd logs plugin to run then you can get away from having to edit the yaml and using kubectl directly.
Instead you can set the `sonobuoy-image` and `kube-conformance-image` flags from the `sonobuoy` cli to point to your Harbor repo and run only the e2e plugin by adding the flag `–plugin e2e`. This would allow you to utilize the `–wait` flag to run it synchronously if you want. The systemd logs plugin image is the only one that can’t (easily) be edited.
You’d still be able to check the status/logs from other terminals, but I find the `–wait` flag preferable to only pinging the status for results.
Thanks again for the read.
Thanks for the guidance John. That also works. Since all my images are in my harbor.rainpole.com repo, my command is something like this (for other readers):
$ sonobuoy run --sonobuoy-image "harbor.rainpole.com/library/sonobuoy:v0.15.0" --kube-conformance-image "harbor.rainpole.com/library/conformance:v1.14.3" --plugin e2e --wait
INFO[0000] created object name=heptio-sonobuoy namespace= resource=namespaces
INFO[0000] created object name=sonobuoy-serviceaccount namespace=heptio-sonobuoy resource=serviceaccounts
INFO[0000] created object name=sonobuoy-serviceaccount-heptio-sonobuoy namespace= resource=clusterrolebindings
INFO[0000] created object name=sonobuoy-serviceaccount namespace= resource=clusterroles
INFO[0000] created object name=sonobuoy-config-cm namespace=heptio-sonobuoy resource=configmaps
INFO[0000] created object name=sonobuoy-plugins-cm namespace=heptio-sonobuoy resource=configmaps
INFO[0000] created object name=sonobuoy namespace=heptio-sonobuoy resource=pods
INFO[0000] created object name=sonobuoy-master namespace=heptio-sonobuoy resource=services
And I can query the status from another terminal:
$ sonobuoy status
PLUGIN STATUS COUNT
e2e running 1
Sonobuoy is still running. Runs can take up to 60 minutes.
Definitely easier than editing the YAML files. Good info!
Great article !!!