After just deploying the newest version of Pivotal Container Services (PKS) and rolling out my first Kubernetes cluster (read all about it here), I wanted to try to do something a bit more interesting than just create another persistent volume claim to test out our vSphere Cloud Provider since I had done this a number of times already. Thanks to some of the work I have been doing with our cloud native team, I was introduced to StatefulSets. That peaked my interest a little, as I had not come across them before.
I guess before we do anything else, we should talk about StatefulSets, which are a relatively newish construct in Kubernetes. These are very similar to ReplicaSets, in so far as they define clones of an application (or a set of pods). However StatefulSets have been introduced to deal with, well Stateful applications. StatefulSets differ from RelicaSets in a few ways. While both deal with replica copies or clones of pods, StatefulSets incrementally number the replica pods, starting with 0, and will increment the pod name with a 1 extension for the next copy, 2 for the next, and do on. ReplicaSets identifiers were very arbitrary, so you could not easily tell which was the initial copy and which was the newest. StatefulSets also guarantee that the first pod (pod 0) will be online and healthy before creating any clone/replica. When scaling back an application, StatefulSets remove the highest numbered one first. We shall see some of this behaviour later on. There is an excellent write-up on StatefulSets and how they relate to ReplicaSets in the free Managing Kubernetes ebook (from my new colleagues over at Heptio).
To see this in action, I am going to use Couchbase. Couchbase is an open-source, distributed (shared-nothing architecture) NoSQL database. And it is of course stateful, so perfect for a StatefulSet. Fortunately for me, someone has already gone to the effort of making a containerized Couchbase for K8s so kudos to them for that. The only items I need to create in K8s are the storage class YAML file, a Couchbase service YAML file so I can access the application on the network, and the StatefulSet YAML file. I was lucky once again as our team had already built these out, so there wasn’t much for me to do to get it all up and running.
Let’s take a look at the YAML files first.
If you’ve read my previous blogs on K8s and the vSphere Cloud Provider (VCP), this should be familiar to you. The provisioner is our vSphere Cloud Provider – called kubernetes.io/vsphere-volume. Of interest here is of course the storagePolicyName parameter, which reference a policy called “gold”. This is a storage policy created via SPBM, the Storage Policy Based Management framework that we have in vSphere. This policy must be created on my vSphere environment – there is no way for someone to do this from within K8s. I built this “gold” policy on my vsanDatastore to create a RAID-1 volume. The resulting VMDK is automatically placed in a folder called kubevols on that datastore. The rest of the logic around building the container volume/VMDK is taken care of by the provider.
Next thing to look at is the Couchbase service YAML file. The service provides a networking endpoint for an application, or to be more precise, a set of one or more pods. This is core K8s stuff – if a pod dies and is replaced with a new pod. Through the use of a service, we don’t need to worry about the IP addresses on the pods. The service takes care of this, handling pods dying and new pods being created. A service is connected to the application/pod(s) through the use of labels. Since the type is LoadBalancer, the service will load the requests across all the Pods that make up the application.
Last but not least, here is the StatefulSet, which initially has been configured for a single Pod deployment. You can see the number of replicas currently set to 1, as well as some specification around the size and access of the persistent volume in the spec request in the volumeClaimTemplate portion of the YAML. Note the use of the same label as seen in the service YAML. There is also a reference to the storage class. And of course, it references the containerized Couchbase application, which I have pulled down from the external repository and placed in my own Harbor repository, and which I could then scan for any anomalies. Fortunately, the scan passed with no issue.
The deployment was pretty straight forward. I use kubectl to deploy the storage class, the service and finally the StatefulSet.
kubectl create -f couchbase-sc.yaml
kubectl create -f couchbase-service.yaml
kubectl create -f couchbase-statefulset.yaml
Now I did encounter one issue – I’m not sure why, but a directory needed for creating the persistent volumes on vSphere did not exist. The behaviour was that my persistent volumes were not being created. I found the reason when I did a kubectl describe on my persistent volume claim.
Warning ProvisioningFailed 8m (x3 over 9m) persistentvolume-controller Failed to provision volume with StorageClass “couchbasesc”: folder ‘/CH-Datacenter/vm/pcf_vms/f74b47da-1b9d-4978-89cd-36bf7789f6bf’ not found
As highlighted in red above, the folder was not found. I manually created the aforementioned folder, and then my persistent volume was successfully created. Next, I checked the events related to my pod, by running a kubectl describe on that, and everything seemed to be working.
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4prv9 (ro)
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
Type: Secret (a volume populated by a Secret)
QoS Class: BestEffort
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Type Reason Age From Message
—- —— —- —- ——-
Normal Scheduled 17s default-scheduler Successfully assigned default/couchbase-0 to 317c87f9-8923-4630-978f-df73125d01f3
Normal Pulling 12s kubelet, 317c87f9-8923-4630-978f-df73125d01f3 pulling image “harbor.rainpole.com/pks_project/couchbase:k8s-petset”
Normal Pulled 0s kubelet, 317c87f9-8923-4630-978f-df73125d01f3 Successfully pulled image “harbor.rainpole.com/pks_project/couchbase:k8s-petset”
Normal Created 0s kubelet, 317c87f9-8923-4630-978f-df73125d01f3 Created container
Normal Started 0s kubelet, 317c87f9-8923-4630-978f-df73125d01f3 Started container
So far, so good. Now in the previous output, I also highlighted an IP address which appeared in the Node: field. This is the K8s nodes on which the pod is running. In order to access the Couchbase UI, I need an IP address from one of the K8s nodes and the port to which the Couchbase container’s port has been mapped. This is how I get that port info.
And now if I point my browser to that IP address and that port (in my case 18.104.22.168:32691), I should get the Couchbase UI. In fact, I should be able to connect to any of the K8s nodes, and the service should redirect me to any node that is running a Pod for this application. Once I see the login prompt, I need to provide some Couchbase login credentials (this app was built with Administrator/password credentials), and once I login, I should see my current deployment of 1 active server, which is correct since I have only a single Replica requested in the StatefulSet YAML file.
Again, so far so good. Now lets scale out the application from a single replica to 3 replicas. How would I do that with a StatefulSet? It can all be done via kubectl. Let’s look at the current StatefulSet, and then scale it out. In the first output, you can see that the Replicas is 1.
cormac@pks-cli:~/Stateful-Demo$ kubectl get statefulset
Normal SuccessfulCreate 17m statefulset-controller create Pod couchbase-0 in StatefulSet couchbase successful
Let’s now go ahead and increase the number of replicas to 3. Here we should not only observe the number of pods increasing (using the incremental numbering scheme mentioned in the introduction), but we should also see the number of persistent volumes begin to increment as well. Let’s look at that next. I’ll run the kubectl get commands a few times so you can see the pods and PV numbers increment gradually.
Let’s take a look at the StatefulSet before going back to the Couchbase UI to see what has happened there. We can now see that the number of replicas has indeed increased, and the events at the end of the output show what has just happened.
cormac@pks-cli:~/Stateful-Demo$ kubectl get statefulset
Normal SuccessfulCreate 21m statefulset-controller create Pod couchbase-0 in StatefulSet couchbase successful
Normal SuccessfulCreate 2m statefulset-controller create Claim couchbase-data-couchbase-1 Pod couchbase-1 in StatefulSet couchbase success
Normal SuccessfulCreate 2m statefulset-controller create Pod couchbase-1 in StatefulSet couchbase successful
Normal SuccessfulCreate 2m statefulset-controller create Claim couchbase-data-couchbase-2 Pod couchbase-2 in StatefulSet couchbase success
Normal SuccessfulCreate 2m statefulset-controller create Pod couchbase-2 in StatefulSet couchbase successful
OK, our final step is to check the application. For that we go back to the Couchbase UI and take a look at the “servers”. The first thing we notice is that there are now 2 new servers that are Pending Rebalance, as shown in the lower right hand corner of the UI.
When we click on it, we are taken to the Server Nodes view – Pending Rebalance. Now, not only do we see an option to Rebalance, but we also have a failover warning stating that at least two servers with the data service are required to provide replication.
Let’s click on the Rebalance button next. This will kick of the Rebalance activity across all 3 nodes.
And finally, our Couchbase database should be balanced across all 3 nodes, alongside the option of Fail Over.
So that was pretty seamless, wasn’t it? Hopefully that has given you a good idea about the purpose of StatefulSets. As well as that, hopefully you can see how nicely it integrates with the vSphere Cloud Provider (VCP) to give persistent volumes on vSphere storage for Kubernetes containerized applications.
And finally, just to show you that these volumes are on the vSAN datastore (the datastore that matches the “gold” policy in the storage class), here are the 3 volumes (VMDKs) in the kubevols folder.