Kubernetes Storage on vSphere 101 – Deployments and ReplicaSets
In my previous 101 posts on Kubernetes Storage on vSphere, we saw how to create “static” persistent volumes (PVs) by mapping an existing virtual machine disk (VMDK) directly into a persistent volume (PV) manifest YAML file. We also saw that we could dynamically instantiate PVs through the use of a StorageClass. We saw how a StorageClass can also be used to apply features of the underlying vSphere storage, such as a storage policy, to a PV and how Pods can consume both static or dynamic PVs through the use of persistent volume claims (PVCs). However in both previous exercises, we have spent a considerable amount of time building out PVC manifest YAML files for every PV. This is not scalable. If we have a large scale-out application made up of many Pods and PVs, and this application has some built in replication features where the application is able to replicate itself across Pods and PVs, do we really want to be in a position where we have to build new Pod YAML files and new PVC YAML files for every instance of the application? The obvious answer is no, we do not.
The other question is how do we make our applications highly available in Kubernetes? In other words, if a Pod fails, we don’t want our application to be impacted. Do we also want to be involved in restarting any Pods that might have failed? Again, the answer is no. We want K8s to “supervise” the application in some way, and if a Pod fails, have K8s recreate and restart it. In some cases, there may be no need to replicate the data – perhaps the application just provides some front-end functionality, like a web server, and we just want K8s to maintain a desired number of Pods for this app. This is where the resources/objects called Deployments (with ReplicaSets) come in. These allow for a desired number of Pods to be created, scaled in and out, and recreated/restarted in the event of a failure.
Deployments and ReplicaSets go hand in hand. You can think of Deployments managing ReplicaSets and ReplicaSets managing Pods. You might ask why there are these levels of abstractions. Well, the ReplicaSet will ensure that the correct number of Pods are created and kept running, as per the Replica entry in the Deployment YAML. A Deployment object then manages how ReplicaSets behave. For instance, in the case of an upgrade of the application, a Deployment will start the new ReplicaSets to roll out the updated application, and when that has completed it will take care of terminating and removing the older ReplicaSets with the Pods running the previous version of the application. As we will see later, if the whole ReplicaSet is deleted, it is recreated and it in turn will recreate the required number of replica Pods.
You may have also come across StatefulSets and wondering what the difference is between them and Deployments. There are a number of differences. Primarily, StatefulSets allow for the ordered start and shutdown of Pods, and easily identify the primary node of an application through sequential numbering of Pods. Deployments start and stop Pods in a very random order. This is not ideal for certain distributed applications. Another significant difference is that one could use StatefulSets to “supervise” both Pods and storage, allowing easy requesting of dynamic PVs on behalf of Pods as well as scale out of both Pods and PVs, and also the recreation of either object should they fail. Deployments and ReplicaSets are really for supervising Pods only. While these series of posts are a series about K8s storage on vSphere, I thought it important to show an example of a Deployment and StatefulSet so that you can contrast the differences later on. A post on StatefulSets will follow shortly.
OK – that was a lot of information to digest. Why don’t we go ahead and build out our first Deployment. We will continue to use our trusty busybox as the only container running in the Pod – this is the same container that we used in previous demo examples. Here is a very simple manifest YAML for our Deployment, which will initially start with a single Pod in the ReplicaSet:
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-deployment
labels:
app: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
spec:
containers:
- name: demo
image: "k8s.gcr.io/busybox"
command: [ "sleep", "1000000" ]
The main items to highlight here are the replica and the selector fields. As mentioned, we are going to start with a single replica, and then scale it out. The selector field is how we tell the Deployment which Pods it needs to manage. By using selector.matchLabels.app set to demo, any Pods that have a matching label of demo will be managed. Let’s go ahead and roll out our deployment, and take a closer look at the Deployment, ReplicaSet and Pod objects after we have done that. If we use the describe option on the Deployment and ReplicaSet objects, we can more detailed information about them.
$ kubectl create -f demo-deployment.yaml deployment.apps/demo-deployment created $ kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE demo-deployment 1 1 1 1 5s $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 1 1 1 9s $ kubectl get pods NAME READY STATUS RESTARTS AGE demo-deployment-5457d695f6-6wbnx 1/1 Running 0 12s $ kubectl describe deploy Name: demo-deployment Namespace: default CreationTimestamp: Mon, 03 Jun 2019 14:47:53 +0100 Labels: app=demo Annotations: deployment.kubernetes.io/revision: 1 Selector: app=demo Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=demo Containers: demo: Image: k8s.gcr.io/busybox Port: <none> Host Port: <none> Command: sleep 1000000 Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: <none> NewReplicaSet: demo-deployment-5457d695f6 (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 1s deployment-controller Scaled up replica set demo-deployment-5457d695f6 to 1 $ kubectl describe rs Name: demo-deployment-5457d695f6 Namespace: default Selector: app=demo,pod-template-hash=5457d695f6 Labels: app=demo pod-template-hash=5457d695f6 Annotations: deployment.kubernetes.io/desired-replicas: 1 deployment.kubernetes.io/max-replicas: 2 deployment.kubernetes.io/revision: 1 Controlled By: Deployment/demo-deployment Replicas: 1 current / 1 desired Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=demo pod-template-hash=5457d695f6 Containers: demo: Image: k8s.gcr.io/busybox Port: <none> Host Port: <none> Command: sleep 1000000 Environment: <none> Mounts: <none> Volumes: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 5s replicaset-controller Created pod: demo-deployment-5457d695f6-6wbn
Everything looks good. From the above, it should be clear that there is a Deployment with a single ReplicaSet that contains a single Pod running the busybox image. Note that there are no volumes or mounts. Let’s now do a scale out test, where we will scale the deployment to 3 replicas. Using a describe, we can also see the events and if any of the objects had problems performing the request.
$ kubectl scale deploy demo-deployment --replicas=3 deployment.extensions/demo-deployment scaled $ kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE demo-deployment 3 3 3 3 3m4s $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 3m9s $ kubectl get pods NAME READY STATUS RESTARTS AGE demo-deployment-5457d695f6-6wbnx 1/1 Running 0 3m14s demo-deployment-5457d695f6-fsnfh 1/1 Running 0 17s demo-deployment-5457d695f6-z447n 1/1 Running 0 17s $ kubectl describe deploy Name: demo-deployment Namespace: default CreationTimestamp: Mon, 03 Jun 2019 14:47:53 +0100 Labels: app=demo Annotations: deployment.kubernetes.io/revision: 1 Selector: app=demo Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 25% max unavailable, 25% max surge Pod Template: Labels: app=demo Containers: demo: Image: k8s.gcr.io/busybox Port: <none> Host Port: <none> Command: sleep 1000000 Environment: <none> Mounts: <none> Volumes: <none> Conditions: Type Status Reason ---- ------ ------ Progressing True NewReplicaSetAvailable Available True MinimumReplicasAvailable OldReplicaSets: <none> NewReplicaSet: demo-deployment-5457d695f6 (3/3 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 3m5s deployment-controller Scaled up replica set demo-deployment-5457d695f6 to 1 Normal ScalingReplicaSet 8s deployment-controller Scaled up replica set demo-deployment-5457d695f6 to 3 $ kubectl describe rs Name: demo-deployment-5457d695f6 Namespace: default Selector: app=demo,pod-template-hash=5457d695f6 Labels: app=demo pod-template-hash=5457d695f6 Annotations: deployment.kubernetes.io/desired-replicas: 3 deployment.kubernetes.io/max-replicas: 4 deployment.kubernetes.io/revision: 1 Controlled By: Deployment/demo-deployment Replicas: 3 current / 3 desired Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=demo pod-template-hash=5457d695f6 Containers: demo: Image: k8s.gcr.io/busybox Port: <none> Host Port: <none> Command: sleep 1000000 Environment: <none> Mounts: <none> Volumes: <none> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 3m13s replicaset-controller Created pod: demo-deployment-5457d695f6-6wbnx Normal SuccessfulCreate 16s replicaset-controller Created pod: demo-deployment-5457d695f6-fsnfh Normal SuccessfulCreate 16s replicaset-controller Created pod: demo-deployment-5457d695f6-z447n $
That appears to have worked seamlessly. It is also is very fast, since there is no storage to provision. At this point, we have seen how to very easily scale out a Deployment to deploy additional Pods through the use of ReplicaSets. Let’s now see how ReplicaSets ensure we have the correct number of Pods running. Let’s delete one of the Pods, and watch how a new one is started in its place.
$ kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE demo-deployment 3 3 3 3 68m $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 68m $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE demo-deployment-5457d695f6-ds2ws 1/1 Running 0 2m55s 10.200.101.10 5670630f-596b-4503-a3fa-84cf02752822 <none> demo-deployment-5457d695f6-fsnfh 1/1 Running 0 65m 10.200.16.19 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-z447n 1/1 Running 0 65m 10.200.16.20 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> $ kubectl delete pod demo-deployment-5457d695f6-fsnfh pod "demo-deployment-5457d695f6-fsnfh" deleted $ kubectl get deploy NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE demo-deployment 3 3 3 3 69m $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 69m $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE demo-deployment-5457d695f6-cz96x 1/1 Running 0 54s 10.200.16.21 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-ds2ws 1/1 Running 0 4m1s 10.200.101.10 5670630f-596b-4503-a3fa-84cf02752822 <none> demo-deployment-5457d695f6-z447n 1/1 Running 0 66m 10.200.16.20 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> $
If we do a describe on the ReplicaSet, we can see an event related to the creation of a new Pod to replace the deleted one.
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 2m10s replicaset-controller Created pod: demo-deployment-5457d695f6-cz96x
Let’s take this a step further and delete the whole ReplicaSet. We should see the original Pods terminating and the new ones getting created. At one point during this event, we see the three original Pods terminating, and the three new Pods created. Eventually the original Pods are removed.
$ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 86m $ kubectl delete rs demo-deployment-5457d695f6 replicaset.extensions "demo-deployment-5457d695f6" deleted $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 4s $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE demo-deployment-5457d695f6-6x5nx 1/1 Running 0 11s 10.200.41.8 afa28938-19e4-407f-9afa-714bb1387741 <none> demo-deployment-5457d695f6-8ttdx 1/1 Running 0 11s 10.200.16.22 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-cz96x 1/1 Terminating 0 18m 10.200.16.21 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-ds2ws 1/1 Terminating 0 21m 10.200.101.10 5670630f-596b-4503-a3fa-84cf02752822 <none> demo-deployment-5457d695f6-dwxtm 1/1 Running 0 11s 10.200.16.23 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-z447n 1/1 Terminating 0 84m 10.200.16.20 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> $ kubectl get rs NAME DESIRED CURRENT READY AGE demo-deployment-5457d695f6 3 3 3 60s $ kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE demo-deployment-5457d695f6-6x5nx 1/1 Running 0 63s 10.200.41.8 afa28938-19e4-407f-9afa-714bb1387741 <none> demo-deployment-5457d695f6-8ttdx 1/1 Running 0 63s 10.200.16.22 d875cb5a-d889-48f9-835c-0402bdd509a6 <none> demo-deployment-5457d695f6-dwxtm 1/1 Running 0 63s 10.200.16.23 d875cb5a-d889-48f9-835c-0402bdd509a6 <none>
We won’t look at detailed failure handling in detail in this post (e.g. who internally in K8s is responsible for doing what action in the background?) – we will come back to that in a later post, but suffice to say that if a Pod fails, a new Pod is created to ensure the desired state of the application is met, as described in the Deployment manifest. One other thing to highlight is the nomenclature – the naming convention of the Pods in the ReplicaSet does not make it very clear the order in which Pods were created, nor which Pod would be removed if we were to scale the number of replicas down to a smaller number. This is one of the main advantages that StatefulSets has, which we shall see shortly.
Again, while this post did not cover storage, I felt it important to understand the difference between Deployments and StatefulSets. As mentioned, if these Pods all had access to the same ReadWriteMany volume (e.g. NFS file share) which is already highly available on some external storage, then a Deployment with ReplicaSets might be ideal to make this application highly available in Kubernetes. In my next post, we will look at how we can manage both compute (Pods) and storage (PVs) at the same time through the use of a StatefulSet object. We will also see how the nomenclature used by StatefulSets makes it easy to understand the order in which Pods were deployed, which Pod is using which PV, and also which Pod would be removed it the application was scaled down.
Manifests used in this demo can be found on my vsphere-storage-101 github repo.