Bryan first introduced David Xia from Spotify. David shared with us his story of how Spotify deleted not one, but two of their three Kubernetes clusters during a migration. It was a pretty engaging presentation, and my takeaways were the lessons learnt, namely that (1) you need to plan for failure by having redundancy and the ability to do roll-backs, (2) migrate large complex infra gradually and (3), have a culture of learning, not blame. I like this last lesson a lot.
There were further keynotes from Oracle Cloud, where Bob Quillin talked about some of their reference customers. Then we had Katie Gamanji from Conde Nast who are a large digital publishing house which includes household names such as Vogue, GQ and Wired. Katie shared with us their journey to Kubernetes once they decided to roll out a unified platform across all their different regions using cloud native. Interesting to hear about the challenges and how certain things are still an issue for them, such as upgrades.
The final keynote was from Saad Ali, and it was all about dismissing the myth that storage is hard in Kubernetes. Saad broke his keynote into 4 parts. When it comes to storage in K8s, you need to (1) spend some time understanding the needs of your application and then selecting the correct storage, considering whether you want a Data Service like an S3 bucket, or a NoSQL DB, or whether in fact you need to use file or block storage. Once that decision is made, you then move on to (2) which is the deployment of the storage. As Saad stated, this can be managed or un-managed storage, and depending on whether you are on-prem or using a cloud provider. Saad stated that you could also deploy storage into K8s, and this is just another stateful app. While some of these storage solutions might be complex to deploy, many operators are starting to appear to make this much easier (Rook and Ceph was the example he shared). Of course, when dealing with cloud providers, you can choose to go with a managed storage/cloud storage option. And of course, if you are on-prem, there are a host of block and file storage solutions available as well. Lots of choice for customers here. Next it was on to (3) and how to make the storage available to the cluster. Saad mentioned the CSI driver initiative that is currently underway, which I mentioned in yesterday’s post. This enables K8s to integrate with pretty much any block/file storage, simplifying the task of making storage available to your cluster. Finally, he talked about (4) how a stateful app can consume the storage. He mentioned the concept of PVs/PVCs, etc, as an example of how this has been very much simplified as well. Very good session, and the take-away for me is that stateful apps are very much at the forefronts of peoples minds in the Kubernetes community.
After the keynotes, I headed over to the CNCF Storage Working Group session, hosted by Alex Chircop of StorageOS. This was a high level overview of what the working group did, and some of the projects that they were involved in. One of the projects to come out of the group was the CNCF Storage Whitepaper, giving details about storage implementations and features. If you are new to storage, this paper offers a great primer on all the different storage types and technologies that are out there.
After lunch, I attended Saad Ali’s “The Magic of Kubernetes Self-Healing Capabilities”. Saad told us how this magic is made up of two things; the first is how
Saad also shared an example of what happens under the covers when as node dies/fails, and how controllers behave when such an event takes place. Every node reports back an “I’m alive” (heartbeat) every 10 seconds. The node controller monitors last keep-alive. If it notices that there has been no heartbeat update for 5 minutes by talking to API server, it evicts all the pod objects from the node. Now the replica controller notices that there is a missing pod object, so it automatically creates a new pod object. Master scheduler controller now sees a new pod object in the API server and it will figure out where to place it, and update the API server with the placement details. Finally a new node takes over running of pod. The node which runs the pod continuously monitors to see if the pod is running, and will restart it If it crashes. The node is now responsible for keeping it running.
He shared a few other scenarios as well, and overall this was a very enjoyable session. It gave me a good insight into how certain things work in K8s.
After this session, I really wanted to get along to a session entitled “Improving Availability for Stateful Applications in Kubernetes” by Michelle Au. Unfortunately, I had a prior commitment, and had to leave the session early, but this session was standing room only. From the part that I did manage to see, Michelle was doing a great job in explaining all of the different storage types and how to achieve different levels of availability with each. I plan to watch back this recording, as it looked really good.
So that’s a wrap for day 2. One thing is very clear – storage has become a huge topic of conversation in the Kubernetes community. Tomorrow I hope to spend a bit more time in the expo. I did get some time today to catch up with OpenEBS and Kasten. Kasten’s K10 platform looks very interesting, as does OpenEBS’s announcement for local PV support.
Thanks for reading this far.