Kubernetes Storage on vSphere 101 – Deployments and ReplicaSets

In my previous 101 posts on Kubernetes Storage on vSphere, we saw how to create “static” persistent volumes (PVs) by mapping an existing virtual machine disk (VMDK) directly into a persistent volume (PV) manifest YAML file. We also saw that we could dynamically instantiate PVs through the use of a StorageClass. We saw how a StorageClass can also be used to apply features of the underlying vSphere storage, such as a storage policy, to a PV and how Pods can consume both static or dynamic PVs through the use of persistent volume claims (PVCs). However in both previous exercises, we have spent a considerable amount of time building out PVC manifest YAML files for every PV. This is not scalable. If we have a large scale-out application made up of many Pods and PVs, and this application has some built in replication features where the application is able to replicate itself across Pods and PVs, do we really want to be in a position where we have to build new Pod YAML files and new PVC YAML files for every instance of the application? The obvious answer is no, we do not.

The other question is how do we make our applications highly available in Kubernetes? In other words, if a Pod fails, we don’t want our application to be impacted. Do we also want to be involved in restarting any Pods that might have failed? Again, the answer is no. We want K8s to “supervise” the application in some way, and if a Pod fails, have K8s recreate and restart it. In some cases, there may be no need to replicate the data – perhaps the application just provides some front-end functionality, like a web server, and we just want K8s to maintain a desired number of Pods for this app. This is where the resources/objects called Deployments (with ReplicaSets) come in. These allow for a desired number of Pods to be created, scaled in and out, and recreated/restarted in the event of a failure.

Deployments and ReplicaSets go hand in hand. You can think of Deployments managing ReplicaSets and ReplicaSets managing Pods. You might ask why there are these levels of abstractions. Well, the ReplicaSet will ensure that the correct number of Pods are created and kept running, as per the Replica entry in the Deployment YAML. A Deployment object then manages how ReplicaSets behave. For instance, in the case of an upgrade of the application, a Deployment will start the new ReplicaSets to roll out the updated application, and when that has completed it will take care of terminating and removing the older ReplicaSets with the Pods running the previous version of the application. As we will see later, if the whole ReplicaSet is deleted, it is recreated and it in turn will recreate the required number of replica Pods.

You may have also come across StatefulSets and wondering what the difference is between them and Deployments. There are a number of differences. Primarily, StatefulSets allow for the ordered start and shutdown of Pods, and easily identify the primary node of an application through sequential numbering of Pods. Deployments start and stop Pods in a very random order. This is not ideal for certain distributed applications. Another significant difference is that one could use StatefulSets to “supervise” both Pods and storage, allowing easy requesting of dynamic PVs on behalf of Pods as well as scale out of both Pods and PVs, and also the recreation of either object should they fail. Deployments and ReplicaSets are really for supervising Pods only. While these series of posts are a series about K8s storage on vSphere, I thought it important to show an example of a Deployment and StatefulSet so that you can contrast the differences later on. A post on StatefulSets will follow shortly.

OK – that was a lot of information to digest. Why don’t we go ahead and build out our first Deployment. We will continue to use our trusty busybox as the only container running in the Pod – this is the same container that we used in previous demo examples. Here is a very simple manifest YAML for our Deployment, which will initially start with a single Pod in the ReplicaSet:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
  labels:
    app: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: demo
  template:
    metadata:
      labels:
        app: demo
    spec:
      containers:
      - name: demo
        image: "k8s.gcr.io/busybox"
        command: [ "sleep", "1000000" ]

The main items to highlight here are the replica and the selector fields. As mentioned, we are going to start with a single replica, and then scale it out. The selector field is how we tell the Deployment which Pods it needs to manage. By using selector.matchLabels.app set to demo, any Pods that have a matching label of demo will be managed. Let’s go ahead and roll out our deployment, and take a closer look at the Deployment, ReplicaSet and Pod objects after we have done that. If we use the describe option on the Deployment and ReplicaSet objects, we can more detailed information about them.

$ kubectl create -f demo-deployment.yaml
deployment.apps/demo-deployment created

$ kubectl get deploy
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
demo-deployment   1         1         1            1           5s

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   1         1         1       9s

$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
demo-deployment-5457d695f6-6wbnx   1/1     Running   0          12s

$ kubectl describe deploy
Name:                   demo-deployment
Namespace:              default
CreationTimestamp:      Mon, 03 Jun 2019 14:47:53 +0100
Labels:                 app=demo
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=demo
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=demo
  Containers:
   demo:
    Image:      k8s.gcr.io/busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sleep
      1000000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  <none>
NewReplicaSet:   demo-deployment-5457d695f6 (1/1 replicas created)

Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  1s    deployment-controller  Scaled up replica set demo-deployment-5457d695f6 to 1

$ kubectl describe rs
Name:           demo-deployment-5457d695f6
Namespace:      default
Selector:       app=demo,pod-template-hash=5457d695f6
Labels:         app=demo
                pod-template-hash=5457d695f6
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/demo-deployment
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=demo
           pod-template-hash=5457d695f6
  Containers:
   demo:
    Image:      k8s.gcr.io/busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sleep
      1000000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  5s    replicaset-controller  Created pod: demo-deployment-5457d695f6-6wbn

Everything looks good. From the above, it should be clear that there is a Deployment with a single ReplicaSet that contains a single Pod running the busybox image. Note that there are no volumes or mounts. Let’s now do a scale out test, where we will scale the deployment to 3 replicas.  Using a describe, we can also see the events and if any of the objects had problems performing the request.

$ kubectl scale deploy demo-deployment --replicas=3
deployment.extensions/demo-deployment scaled

$ kubectl get deploy
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
demo-deployment   3         3         3            3           3m4s

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       3m9s

$ kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
demo-deployment-5457d695f6-6wbnx   1/1     Running   0          3m14s
demo-deployment-5457d695f6-fsnfh   1/1     Running   0          17s
demo-deployment-5457d695f6-z447n   1/1     Running   0          17s

$ kubectl describe deploy
Name:                   demo-deployment
Namespace:              default
CreationTimestamp:      Mon, 03 Jun 2019 14:47:53 +0100
Labels:                 app=demo
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=demo
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=demo
  Containers:
   demo:
    Image:      k8s.gcr.io/busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sleep
      1000000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   demo-deployment-5457d695f6 (3/3 replicas created)

Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  3m5s  deployment-controller  Scaled up replica set demo-deployment-5457d695f6 to 1
  Normal  ScalingReplicaSet  8s    deployment-controller  Scaled up replica set demo-deployment-5457d695f6 to 3

$ kubectl describe rs
Name:           demo-deployment-5457d695f6
Namespace:      default
Selector:       app=demo,pod-template-hash=5457d695f6
Labels:         app=demo
                pod-template-hash=5457d695f6
Annotations:    deployment.kubernetes.io/desired-replicas: 3
                deployment.kubernetes.io/max-replicas: 4
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/demo-deployment
Replicas:       3 current / 3 desired
Pods Status:    3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=demo
           pod-template-hash=5457d695f6
  Containers:
   demo:
    Image:      k8s.gcr.io/busybox
    Port:       <none>
    Host Port:  <none>
    Command:
      sleep
      1000000
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  3m13s  replicaset-controller  Created pod: demo-deployment-5457d695f6-6wbnx
  Normal  SuccessfulCreate  16s    replicaset-controller  Created pod: demo-deployment-5457d695f6-fsnfh
  Normal  SuccessfulCreate  16s    replicaset-controller  Created pod: demo-deployment-5457d695f6-z447n
$

That appears to have worked seamlessly. It is also is very fast, since there is no storage to provision. At this point, we have seen how to very easily scale out a Deployment to deploy additional Pods through the use of ReplicaSets. Let’s now see how ReplicaSets ensure we have the correct number of Pods running. Let’s delete one of the Pods, and watch how a new one is started in its place.

$ kubectl get deploy
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
demo-deployment   3         3         3            3           68m

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       68m

$ kubectl get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP              NODE                                   NOMINATED NODE
demo-deployment-5457d695f6-ds2ws   1/1     Running   0          2m55s   10.200.101.10   5670630f-596b-4503-a3fa-84cf02752822   <none>
demo-deployment-5457d695f6-fsnfh   1/1     Running   0          65m     10.200.16.19    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-z447n   1/1     Running   0          65m     10.200.16.20    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>

$ kubectl delete pod demo-deployment-5457d695f6-fsnfh
pod "demo-deployment-5457d695f6-fsnfh" deleted

$ kubectl get deploy
NAME              DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
demo-deployment   3         3         3            3           69m

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       69m

$ kubectl get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE    IP              NODE                                   NOMINATED NODE
demo-deployment-5457d695f6-cz96x   1/1     Running   0          54s    10.200.16.21    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-ds2ws   1/1     Running   0          4m1s   10.200.101.10   5670630f-596b-4503-a3fa-84cf02752822   <none>
demo-deployment-5457d695f6-z447n   1/1     Running   0          66m    10.200.16.20    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
$

If we do a describe on the ReplicaSet, we can see an event related to the creation of a new Pod to replace the deleted one.

Events:
  Type    Reason            Age    From                   Message
  ----    ------            ----   ----                   -------
  Normal  SuccessfulCreate  2m10s  replicaset-controller  Created pod: demo-deployment-5457d695f6-cz96x

Let’s take this a step further and delete the whole ReplicaSet. We should see the original Pods terminating and the new ones getting created. At one point during this event, we see the three original Pods terminating, and the three new Pods created. Eventually the original Pods are removed.

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       86m

$ kubectl delete rs demo-deployment-5457d695f6
replicaset.extensions "demo-deployment-5457d695f6" deleted

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       4s

$ kubectl get pod -o wide
NAME                               READY   STATUS        RESTARTS   AGE   IP              NODE                                   NOMINATED NODE
demo-deployment-5457d695f6-6x5nx   1/1     Running       0          11s   10.200.41.8     afa28938-19e4-407f-9afa-714bb1387741   <none>
demo-deployment-5457d695f6-8ttdx   1/1     Running       0          11s   10.200.16.22    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-cz96x   1/1     Terminating   0          18m   10.200.16.21    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-ds2ws   1/1     Terminating   0          21m   10.200.101.10   5670630f-596b-4503-a3fa-84cf02752822   <none>
demo-deployment-5457d695f6-dwxtm   1/1     Running       0          11s   10.200.16.23    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-z447n   1/1     Terminating   0          84m   10.200.16.20    d875cb5a-d889-48f9-835c-0402bdd509a6   <none>

$ kubectl get rs
NAME                         DESIRED   CURRENT   READY   AGE
demo-deployment-5457d695f6   3         3         3       60s

$ kubectl get pod -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP             NODE                                   NOMINATED NODE
demo-deployment-5457d695f6-6x5nx   1/1     Running   0          63s   10.200.41.8    afa28938-19e4-407f-9afa-714bb1387741   <none>
demo-deployment-5457d695f6-8ttdx   1/1     Running   0          63s   10.200.16.22   d875cb5a-d889-48f9-835c-0402bdd509a6   <none>
demo-deployment-5457d695f6-dwxtm   1/1     Running   0          63s   10.200.16.23   d875cb5a-d889-48f9-835c-0402bdd509a6   <none>

We won’t look at detailed failure handling in detail in this post (e.g. who internally in K8s is responsible for doing what action in the background?) – we will come back to that in a later post, but suffice to say that if a Pod fails, a new Pod is created to ensure the desired state of the application is met, as described in the Deployment manifest. One other thing to highlight is the nomenclature – the naming convention of the Pods in the ReplicaSet does not make it very clear the order in which Pods were created, nor which Pod would be removed if we were to scale the number of replicas down to a smaller number. This is one of the main advantages that StatefulSets has, which we shall see shortly.

Again, while this post did not cover storage, I felt it important to understand the difference between Deployments and StatefulSets. As mentioned, if these Pods all had access to the same ReadWriteMany volume (e.g.  NFS file share) which is already highly available on some external storage, then a Deployment with ReplicaSets might be ideal to make this application highly available in Kubernetes. In my next post, we will look at how we can manage both compute (Pods) and storage (PVs) at the same time through the use of a StatefulSet object. We will also see how the nomenclature used by StatefulSets makes it easy to understand the order in which Pods were deployed, which Pod is using which PV, and also which Pod would be removed it the application was scaled down.

Manifests used in this demo can be found on my vsphere-storage-101 github repo.