Kubernetes Storage on vSphere 101 – StatefulSet
In my last post we looked at creating a highly available application that used multiple Pods in Kubernetes with Deployments and ReplicaSets. However, this was only focused on Pods. In this post, we will look at another way of creating highly available applications through the use of StatefulSets. The first question you will probably have is what is the difference between a Deployment (with ReplicaSets) and a StatefulSet. From a high level perspective, conceptually we can consider that the major difference is that a Deployment is involved in maintaining the desired number of Pods available for an application, whereas a StatefulSet is involved in maintaining the desired number of Pods as well as storage in the form of persistent volumes (PVs) available. I’m obviously simplifying for the purposes of this 101 discussion. There are some other differences which we will get to later.
The next question is when would you use one over the other? Well, lets say you had a stateless application where you did not need external storage, or indeed an application where all Pods wrote the the same ReadWriteMany shared external storage, such as an NFS file share. In these cases you would not need to manage any volumes on behalf of the application, since all Pods access the same storage. You would only need to manage the Pods, using an object that ensures the desired number of Pods are running. For such as application, as we saw in the previous post, you could use a Deployment object with ReplicaSets. This will try to ensure that the correct number of Pods desired by the application are available.
Now, if a distributed application has built in replication features, for example a NoSQL database like Cassandra, each Pod would probably require its own storage. With such an application, as you scaled out the Pods, you would also want to scale out the storage. This would be achieved by instantiating a new and unique persistent volume (PV) for each Pod. Since these applications have their own built in replication techniques to make them highly available and survive outages, should a Pod go down (impacting part of this application), the remaining Pods continue to run the application since they have their own unique full copies of the replicated data, and so the application can remain online and available. The StatefulSet will attempt to maintain the correct number of replicas (in this case Pods + PVs) to ensure that the application can self-heal. We will talk about failures and how storage handles such issues in another post. Suffice to say that we can simplify the difference between Deployments+ReplicaSets and StatefulSet by stating that a Deployment+ReplicaSet is used for maintaining a desired number of Pods, and a StatefulSet can be use for maintaining a desired number of both Pods and PVs.
So how does a StatefulSet create PVs and PVCs on the fly? It does it through the use of a volumeClaimTemplate in its manifest YAML file. This is where you add the reference to the StorageClass and the specification of the volume you wish to create. This is then included as a volumeMount for a container within the Pod. On applying the manifest YAML for the StatefulSet, you should observe the Pods and PVCs getting created and named with an incremental numeric sequence. Obviously, the StorageClass that is referenced by the StatefulSet will need to exist for the PVC creation to work.
In the upcoming example, I will deploy a 3 node Cassandra DB as a StatefulSet. One thing that needs to exist for this application to work is a service that will allow the different nodes to communicate with each other. In this example, I am using a headless service. This is created with a ClusterIP type set to None, but allows each of the Pods to communicate using a DNS name. Services are beyond the scope of this discussion, but suffice to say that this is necessary to allow the Cassandra nodes to form their own cluster and replicate their data.
Let’s start the demo by creating the StorageClass. Here is the manifest I am using.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: cass-sc
provisioner: kubernetes.io/vsphere-volume
parameters:
storagePolicyName: raid-1
datastore: vsanDatastore
There should be nothing very new here. Of note are the parameters where we are specifying a storage policy of “raid-1”, which means that any Persistent Volumes created using this StorageClass will instantiate a RAID-1 mirrored virtual machine disk (VMDK) on my vSAN datastore. Have a look back at the 101 StorageClass post if you need a refresher.
Next thing we should start is the headless Service, so that the nodes in the Cassandra application can communicate. Here is the manifest for the very simple headless service. I have named the Service “cassandra”.
apiVersion: v1
kind: Service
metadata:
labels:
app: cassandra
name: cassandra
namespace: cassandra
spec:
clusterIP: None
selector:
app: cassandra
I am going to deploy this app in its own K8s namespace, called cassandra. Thus, in the Service and later on in the StatefulSet, there is a metadata.namespace entry pointing to that namespace. Let’s create that new namespace, and deploy both the StorageClass and Service before we start taking a look at the StatefulSet for my Cassandra application.
$ kubectl create ns cassandra namespace/cassandra created $ kubectl get ns NAME STATUS AGE cassandra Active 7s default Active 4d23h kube-public Active 4d23h kube-system Active 4d23h pks-system Active 4d23h $ kubectl create -f cassandra-sc.yaml storageclass.storage.k8s.io/cass-sc created $ kubectl get sc NAME PROVISIONER AGE cass-sc kubernetes.io/vsphere-volume 8s $ kubectl create -f headless-cassandra-service.yaml service/cassandra created $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cassandra ClusterIP None <none> <none> 6
OK – that’s StorageClass and headless Service taken care of. Now onto the main event, the Cassandra StatefulSet. This is the most complex YAML file that we have looked at so far. The reason for it being so complex is that it includes quite a bit of detail around resources and environmental settings for the Cassandra application. Therefore, I am going to chunk it up a bit and just review it in two parts. Let’s take a look at some entries that we should already be somewhat familiar with. We will fill in the blanks later on.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
namespace: cassandra
labels:
app: cassandra
spec:
serviceName: cassandra
replicas: 3
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
containers:
- name: cassandra
image: gcr.io/google-samples/cassandra:v11
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
.
<snip>
.
volumeMounts:
- name: cassandra-data
mountPath: /cassandra_data
volumeClaimTemplates:
- metadata:
name: cassandra-data
annotations:
volume.beta.kubernetes.io/storage-class: cass-sc
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
OK, let’s talk about the above. It is a StatefulSet and in the spec.serviceName, we specify our headless service created earlier. We have asked for 3 replicas via spec.replicas, and in this case, this will instantiate 3 x Pods and 3 x PVCs using the StorageClass specified in volumeClaimTemplate-metadata.annotation.volume.beta.kubernetes.io/storage-class. The volumes will be ReadWriteOnce, and 1GiB in size. These will then be mounted onto /cassandra_data in each container, as per spec.template.spec.volumeMounts, where the mount name cassandra_data matches the name of the volume claim. I am pulling the v11 Cassandra image as that has built in cqlsh which can be used for create tables, etc. Feel free to use later versions if you wish.
Now, lets take a look at the application specific stuff, which is what I snipped out of the above manifest. Remember this is Cassandra specific stuff, so don’t worry too much about it. It is not necessary to understand these details if you want to understand the concept of a StatefulSet. This block of code appears immediately after the ports section above, and just before the volume mounts. Remember to keep the spaces before each of the entries, or else the YAML file won’t get parsed correctly.
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra-0.cassandra.cassandra.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "K8Demo"
- name: CASSANDRA_DC
value: "DC1-K8Demo"
- name: CASSANDRA_RACK
value: "Rack1-K8Demo"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
The resources section should be fairly self explanatory, where we are requesting a limit for CPU and memory for each of the Pods. The lifecycle sections states that a Cassandra CLI tool called nodetool should be invoked to drain a node when it is stopped. There are then a bunch of environment variables passed in the env section. The important one here is the CASSANDRA_SEEDS which is the DNS name of the first node. It reflects the first host name (cassandra-0), the service name (cassandra) and the namespace name (again, cassandra). Other nodes will connect to this first node to form a cluster, so if you have different service or namespace names, this variable will need to be modified or the hosts won’t be able to join the cluster. Finally, there is a readinessProbe which runs a script to check that everything is working.
If we put this YAML file together and deploy it, we should see a StatefulSet get rolled out which contains 3 Pods, each Pod will have its own clearly identifiable PVC, and each PVC will dynamically request a PV to be created. This PV will be a VMDK on the vSphere vSAN datastore, and will have a “raid-1” policy, meaning the PVs will be mirrored on the vSAN datastore. This is taken from the StorageClass, as seen earlier. Let’s give it a go, keeping in mind that we are now working in the cassandra namespace, and so each kubectl command should reference that namespace. Some objects are global, such as PVs and StorageClasses, so these do not to be queried explicitly by namespace. We will start by showing that there are no Pods, PVCs or PVs, and then deploy the StatefulSet.
$ kubectl get sc NAME PROVISIONER AGE cass-sc kubernetes.io/vsphere-volume 72m $ kubectl get svc -n cassandra NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cassandra ClusterIP None <none> <none> 35s $ kubectl get pods -n cassandra No resources found. $ kubectl get pvc -n cassandra No resources found. $ kubectl get pv No resources found. $ kubectl get sts -n cassandra No resources found.
$ kubectl create -f cassandra-statefulset-orig.yaml statefulset.apps/cassandra create
$ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 3 1 26s $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 Running 0 35s $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 42s $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 33
And after a few minutes in my environment, the full StatefulSet is online with all of the necessary Pods, PVCs and PVs, the latter of which have been instantiated on the fly as the StatefulSet requires.
$ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 3 3 3m44s $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 13m cassandra-1 1/1 Running 0 12m cassandra-2 1/1 Running 0 10m $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 3m51s cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 2m53s cassandra-data-cassandra-2 Bound pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 60s $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 3m44s pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 2m44s pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 61s
And we can also check the application as well, using the nodetool CLI utility mentioned previously. You can see how some of those environment variables passed in from the YAML manifest have been utilized, e.g. Datacenter, Rack.
$ kubectl exec -it cassandra-0 -n cassandra nodetool status Datacenter: DC1-K8Demo ====================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.200.40.8 76.01 KiB 32 58.4% 3d4106bb-d716-4008-b5fc-f89c1af9bed9 Rack1-K8Demo UN 10.200.41.10 95.05 KiB 32 73.5% 94217c7c-310a-4ab6-8a09-2369d56a8691 Rack1-K8Demo UN 10.200.16.38 104.4 KiB 32 68.1% afa03459-bfb9-4399-b35f-a0cd57ca4ebf Rack1-K8Demo
Now, it should be quite obvious how different the StatefulSet is to the Deployment that we saw in an earlier post. The Pods are named in a consistent fashion, as are the PVCs (which use a combination of the Pods and volume names). We can tell which Pod was started first (cassandra-0), which is important for an application like Cassandra, as it means you can tell all the other Pods where they should join to when they start up (remember the CASSANDRA_SEEDS setting from earlier). And unlike Deployments, a StatefulSet will maintain the correct number of Pods and PVs. Let’s verify that by doing a scale-out test on the StatefulSet.
$ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 21m cassandra-1 1/1 Running 0 20m cassandra-2 1/1 Running 0 18m $ kubectl scale sts cassandra --replicas=4 -n cassandra statefulset.apps/cassandra scaled $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 22m cassandra-1 1/1 Running 0 21m cassandra-2 1/1 Running 0 19m cassandra-3 0/1 Pending 0 3s $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 22m cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 21m cassandra-data-cassandra-2 Bound pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19m cassandra-data-cassandra-3 Bound pvc-2e09ab44-879b-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 13s $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 22m pvc-2e09ab44-879b-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-3 cass-sc 10s pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 21m pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 19m $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 22m cassandra-1 1/1 Running 0 21m cassandra-2 1/1 Running 0 19m cassandra-3 0/1 ContainerCreating 0 23s $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 24m cassandra-1 1/1 Running 0 23m cassandra-2 1/1 Running 0 21m cassandra-3 1/1 Running 0 2m17s
$ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 4 4 26m
Notice also how the naming conventions are maintained, both for the Pods and the PVCs. If we scale back down our Cassandra deployment from 4 Pods to 3, 2 or even 1, the Pods that gets removed are the ones with the highest number. Pod 0 is the first Pod created, and is also the last one to get removed – you can see this in the scaling demo next. And relationships between Pods and PVs are also easy to identify. Another thing to note is that even when we scale back the application, the PVCs and PVs are not removed. This is by design to protect your data. So if you do want to remove Persistent Volumes that are no longer used by Pods, this will have to be done manually. Fortunately, due to the naming convention, it is easy to identify which PVCs (and thus PVs) to remove.
$ kubectl scale sts cassandra --replicas=2 -n cassandra statefulset.apps/cassandra scaled $ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 2 3 31m $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 31m cassandra-1 1/1 Running 0 30m cassandra-2 1/1 Terminating 0 28m $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 31m cassandra-1 1/1 Running 0 30m $ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 2 2 31m $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 31m cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 30m cassandra-data-cassandra-2 Bound pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 28m cassandra-data-cassandra-3 Bound pvc-2e09ab44-879b-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 9m49s $ kubectl get pv -n cassandra NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 31m pvc-2e09ab44-879b-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-3 cass-sc 9m50s pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 30m pvc-811c8141-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 29m $ kubectl delete pvc cassandra-data-cassandra-2 cassandra-data-cassandra-3 -n cassandra persistentvolumeclaim "cassandra-data-cassandra-2" deleted persistentvolumeclaim "cassandra-data-cassandra-3" deleted $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 33m cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 32m $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 32m pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 31m
OK – we have successfully scaled up and back down the application, and we can see by the numbering of the Pods that it has worked as expected. Note that you will also need to take care of the Cassandra application at this point. It will report that the Cassandra hosts that were running on those Pods are now marked as “DN” – down. You’ll have to do some cleanup with the “nodetool removenode” to make Cassandra healthy once again. You will need to do this if you wish to scale the StatefulSet once more.
As a final step, lets see how ‘failures’ are handled by StatefulSets. At this point, I have cleaned up Cassandra after the scale tests, and have scaled back out to 3 hosts. Lets check out application first.
$ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 3 3 19h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 19h cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19h cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19h cassandra-data-cassandra-2 Bound pvc-890fbbbc-879d-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 18h $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 19h pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 19h pvc-890fbbbc-879d-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 18
Let’s do the first test, and delete a Pod.
$ kubectl delete pod cassandra-0 -n cassandra pod "cassandra-0" deleted $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 ContainerCreating 0 5s cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 Running 0 19s cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 51s cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h
The Pod was recreated and started after deletion, as we would expect. Since the PVC/PV were not removed, the newly created Pod mounted the existing PV, which it can easily identify by the PVC. OK – let’s try to do the same thing with a PVC. What you should notice is that the delete command will not complete, and the PVC will be left with a status of “Terminating” indefinitely. This is because K8s knows that the PVC cassandra-data-cassandra-0 is being used by the Pod cassandra-0, so it will not remove it.
$ kubectl delete pvc cassandra-data-cassandra-0 -n cassandra persistentvolumeclaim "cassandra-data-cassandra-0" deleted <-- stays here indefinitely --> $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Terminating pvc-1b87e0e6-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19h cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19h cassandra-data-cassandra-2 Bound pvc-890fbbbc-879d-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 18h
In order to remove the PVC, you will have to remove the Pod. Once the Pod is removed, there is no longer any dependency on the PVC. Thus it, and the associated PV can now be removed. However, this leaves you with a bit of an issue. When the Pod is restarted, which it will be since we have asked to have 3 replicas in the StatefulSet, the Pod can no longer be scheduled since there is no longer a PVC for it to use – we just deleted it. You will see the Pod cassandra-0 in a Pending state, and the following events associated with it if you describe the Pod:
$ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 Pending 0 23s cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 187 18h
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 116s (x4 over 116s) default-scheduler persistentvolumeclaim "cassandra-data-cassandra-0" not found
So the obvious next question is how to fix this. The easiest way is to build a new PVC manifest. You can get a good ideas of what the entries should be in the YAML by running a “kubectl get pvc cassandra-data-cassandra-1 -n cassandra -o json” against any of the other PVCs. In my demo, the PVC for cassandra-0 would look something like this:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cassandra-data-cassandra-0
spec:
storageClassName: cass-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
Then I simply create the missing PVC. I should see a new PVC and corresponding PV created. This then means that the Pod will now get scheduled since the PVC is back in place, and my StatefulSet returns to full health.
$ kubectl create -f cassandra-data-cassandra-0-pvc.yaml persistentvolumeclaim/cassandra-data-cassandra-0 created $ kubectl get pvc -n cassandra NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE cassandra-data-cassandra-0 Bound pvc-6213822a-883b-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 13s cassandra-data-cassandra-1 Bound pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 19h cassandra-data-cassandra-2 Bound pvc-890fbbbc-879d-11e9-ac8b-005056a2c144 1Gi RWO cass-sc 18h $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-3defb27e-8798-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-1 cass-sc 19h pvc-6213822a-883b-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-0 cass-sc 13s pvc-890fbbbc-879d-11e9-ac8b-005056a2c144 1Gi RWO Delete Bound cassandra/cassandra-data-cassandra-2 cass-sc 18h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 Pending 0 14m cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 0/1 Running 0 15m cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get pods -n cassandra NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 15m cassandra-1 1/1 Running 0 19h cassandra-2 1/1 Running 0 18h $ kubectl get sts -n cassandra NAME DESIRED CURRENT AGE cassandra 3 3 19h
The main thing to highlight about that last exercise is that a PVC cannot be removed while a Pod is using the claim. Similarly, you cannot delete a PV which is bound to a PVC. This should stop you doing something silly to your application. And that just about does it for the 101 series. You should now have a decent understanding of PVs, PVCs, StorageClasses, Deployments and ReplicaSets, and now StatefulSets when using K8s on vSphere storage. The next item I want to tackle are failure events, and what is supposed to happen when something fails. That will take a little more work and much testing to figure out. Check back soon.
Manifests used in this demo can be found on my vsphere-storage-101 github repo.