Let’s start with what is happening in storage. The first storage related session I went to was on Rook. This was a presentation by Jared Watts. According to Jared, the issues that Rook is trying to solve are to avoid vendor lock-in with storage and also address the issue of portability. If I understood correctly, Rook is about deploying, then provisioning distributed network storage such as CEPH and making it available to applications running on your K8s cluster. However, Rook only does the provisioning and management of storage for K8s – it is not in the data path itself.
Currently Rook supports CEPH but Jared also mentioned that integration with CockroachDB, Minio and Nexenta are in the works. There is a framework for other storage providers who want to integrate into K8s. Rook is currently in alpha state and is an inception level project at the Cloud Native Computing Foundation (CNCF). Find out more about Rook here – https://rook.io/
To finish on the storage aspect of the conference, we had a walk around the solutions exchange to see which storage vendors had a presence. We met the guys from Portworx. I also met them at DockerCon ’17 in Austin and wrote about them here. On asking what is new, they now have the ability to snapshot volumes belonging to an application that is across multiple containers. It seems that they can also encrypt and replicate at a container volume level. So, some nice enhancements since we last spoke. What I omitted to ask is whether they will need to change anything to align with the new CSI approach.
We also caught up with the StorageOS guys. They have both a CSI and in-tree drivers for storage. They are following along with the CSI designs. One thing they are waiting on is the outcome of how CSI will decide on how to do snapshots, and once that is understood, they plan to implement it. Good conversations all round.
We met a number of companies in the solutions exchange who were focused on monitoring K8s. We had some good conversations with both LightStep and DataDog, and Google themselves had their own session talking about OpenCensus, which if I understood correctly, is a single set of libraries to allow metrics and traces to be captured on any application. When you are trying to track a request across multiple systems and/or across multiple micro-services, this becomes quite important. Morgan Mclean of Google stated that they are working on integration with different exporters to export these metrics and traces to, such as Zipkin, Jaeger, SignalFX and of course, Prometheus.
One interesting session that I attended was by Eduardo Silva @ Treasure Data. He talked us through how docker containers and Kubernetes both generate separate log streams, which you really need to unify to get the full picture of what is happening in your cluster. Eduardo introduced us to fluentd data collector, which is run as a daemon set on the cluster (daemon set is a special pod that runs on every node in the cluster). It pulls in the container logs (available from the file system / journald) and the K8s logs from the master node. Although we were caught for time, we were also introduced to fluentbit, a less memory intensive version of fluentd which also does log processing and forwarding. It has various application parsers, can exclude certain pods from logging, has enterprise connectors to the likes of Splunk and Kafka and can redirect its output to, you guessed it, Prometheus. More on fluentd here: https://www.fluentd.org/. More on fluentbit here: https://fluentbit.io/.
Having seen what people were doing in the metrics, tracing and monitoring space, it was also good to see some real life examples highlighting why this is so important. There were a number of sessions describing what could happen when things went wrong with K8s. During the closing keynote on day #1, Oliver Beattie of Monzo Bank in the UK described how a single API change between K8s 1.6 and 1.7 to handle a null reference for replicas led to an outage of over an hour at the bank. It was interesting to hear about the domino effect one minor change could have. On day #2, we heard from the guys at Oath, the digital content division of Verizon, including Yahoo and AOL. They discussed various issues they have had with K8s in production. I guess you could summarize this session along the lines of K8s has a lot of moving parts, and being slightly out of versions with different components can lead to some serious problems. Bugs are also an issue, as is human error. And of course, they shared how they were preventing these issues from happening again through the various guard-rails they were putting in place.
Among the other notable announcements, gVisor was one that caught my attention. This was announced by Aparna Sinha of Google. Aparna mentioned that one of the problems with containers is that they do not contain very well. To address this, they have developed gVisor. This is a very lightweight kernel that runs in user space of an OS. This will allow you to have sandbox’ed containers isolated by gVisor and still get the benefit of containers (resource sharing, quick start). The idea is that this will provide some sort of isolation which will prevent a container impacting the underlying node/host. More details can be found here: https://github.com/google/gvisor
Both of these features (gVisor and Kata containers) would suggest that there are still many benefits to be gained from running containers in VMs which can provide advantages such as security and sandbox’ing over a bare-metal approach.
Lastly, it would be remiss of me if I didn’t mention the VMware/Kubernetes SIG (Special Interest Group). This is led by Steve Wong and Fabio Rapposelli. This is our forum for discussing items like best practices for running K8s on VMware. It is also where we outline our feature roadmap on where we plan to integrate. It is also where we look for input and feedback. It was emphasized very strongly that this was not just for vSphere. If you are running K8s on Fusion or Workstation, you are also most welcome to join. Check it out here.
Lots happening in this space, and lots of things to think about for sure as Kubernetes gains popularity. Thanks for reading.