Kubernetes Storage on vSphere 101 – The basics: PV, PVC, POD
I’ve just returned from KubeCon 2019 in Barcelona, and was surprised to see such a keen interest in how Kubernetes consumed infrastructure related resources, especially storage. Although I have been writing about a lot of Kubernetes related items recently, I wanted to put together a primer on some storage concepts that might be useful as a stepping stone or even on-boarding process to some of you who are quite new to Kubernetes. I am going to talk about this from the point of view of vSphere and vSphere storage. Thus I will try to map vSphere storage constructs such as datastores, policies and VMDKs (virtual machine disks) to Kubernetes constructs such as Storage Classes, Persistent Volume Claims, and Persistent Volumes.
I suppose we should start with the basics, and talk about why storage is important in Kubernetes, or even simpler, for containers. For quite a long time, containers were considered to be stateless. In other words, you spin up one or more containers, do a unit or units of work and any writes are done to ephemeral on-disk files. Then you grab the result, throw the containers away – then rinse and repeat. However, people soon saw the value in being able to use containers not just for stateless workloads, but also stateful workloads. They also needed a way to persist data in case the container crashed. Thus, a mechanism to provide persistent storage for containers was needed.
Let’s now talk about this in the context of Kubernetes. First, we should describe what a Pod is. In it simplest form, a Pod is a group of one or more containers. For our purposes, we will consider a Pod as containing a single container. Now, how do we provide some “external” storage to this Pod? This is where Persistent Volumes, more commonly known as PVs, come in. Possibly the most interesting thing about PVs is that they exist outside of the lifecycle of the Pod. A Pod that uses a PV can come and go, but the PV can remain, and therefore so can your data.
The simplest description of a PV when created in K8s running on top of vSphere is that it is a VMDK, a virtual machine disk. In Kubernetes, it is known as a vsphereVolume. There is a 1:1 mapping between a PV and a VMDK. This may change going forward, where a PV could be mapped to a physical LUN or raw device, but using our current vSphere Cloud Provider (VCP) storage driver for Kubernetes, one can simply think of a PV mapping to a VMDK.
How do we create one of these PVs? Well, there are a number of ways to do this. One way is to create a VMDK manually on a datastore, and then go ahead and build a YAML file for the PV to reference the VMDK directly on a vSphere storage volume. Once we have a PV, we need a way for a Pod to request this storage. This is done via a Persistent Volume Claim, or PVC for short. You may have a bunch of PVs available, but the persistent volume claim abstracts this away. Instead via a PVC, a Pod can request storage of a particular size and certain access mode.
OK – that’s enough theory to start with. Let’s take a look at how the PV, PVC and Pod interoperate in practice.
I am going to begin with creating a 2GB VMDK on my vSAN datastore. To do this, I simply logon to my ESXi host, change directory to /vmfs/volume/vsanDatstore and run the following commands:
[root@esxi-dell-h:/vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec] mkdir demo [root@esxi-dell-h:/vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec] cd demo [root@esxi-dell-h:/vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/9cc4ef5c-370b-7eeb-876d-246e962c2408] vmkfstools -c 2G -d thin -W vsan demo.vmdk Create: 100% done. [root@esxi-dell-h:/vmfs/volumes/vsan:52fae366e94edb86-c6633d0af03e5aec/9cc4ef5c-370b-7eeb-876d-246e962c2408] ls demo.vmdk
Now, back in K8s land, how do I consume that storage? First of all, I need to create a PV file to reference the VMDK I just created. Here is a simple PV YAML file to do just that.
apiVersion: v1
kind: PersistentVolume
metadata:
name: demo-pv
spec:
storageClassName: demo
capacity:
storage: 2Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
vsphereVolume:
volumePath: "[vsanDatastore] demo/demo.vmdk"
fsType: ext4
OK – so a few things to highlight here. One – the indentations are critical. Pay close attention to them when building your own YAML files. The entries are quite straight-forward though. We know it is a PersistentVolume, we have given it a name called demo-pv. It is 2GiB in size to match what we created manually, it is a vsphereVolume and we have provided the volumePath to the VMDK. The access mode is set to ReadWriteOnce, meaning that it can only be accessed by one container at any time. The persistentVolumeReclaimPolicy is what should happen to the PV when it is no longer “claimed”. Should I Retain the PV, or another option is should I Delete it? We’ll cover this in more detail in a future post. In this case, I have decided to Retain it. Ignore StorageClass as well for the moment – we will revisit this later. Suffice to say that the use of StorageClass here, setting it to demo, is simply a way to connect the PV to the Persistent Volume Claim (which we will cover soon).
OK – let’s create our new PV. First, we verify that there is no PV. Then we create the PV with the kubectl command. Finally, we can see our PV was successfully created, and then with the describe option, we can get more details about the PV.
$ kubectl get pv No resources found. $ kubectl create -f demo-pv.yaml persistentvolume/demo-pv created $ kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE demo-pv 2Gi RWO Retain Available demo 3s $ kubectl describe pv demo-pv Name: demo-pv Labels: <none> Annotations: <none> Finalizers: [kubernetes.io/pv-protection] StorageClass: demo Status: Available Claim: Reclaim Policy: Retain Access Modes: RWO Capacity: 2Gi Node Affinity: <none> Message: Source: Type: vSphereVolume (a Persistent Disk resource in vSphere) VolumePath: [vsanDatastore] demo/demo.vmdk FSType: ext4 StoragePolicyName: VolumeID: Events: <none>
OK – at this point the PV exists. Now we need to make a PVC which will match this PV. With a PVC, we can consume the PV for the Pod. Here is a simple PVC that will be used on behalf of the Pod to use this PV. Note also that StorageClass matches the same entry we used in the PV YAML. This could be anything you like, so long as they both match.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: demo-pvc
spec:
storageClassName: demo
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
You can see many of the same fields that we saw in the PV. At this point, you are probably asking why you need to do all of this again. Well, in this statically provisioned storage demo, you are not seeing the full power of abstraction that PVCs provide. Later on we will look at dynamically provisioning, and then it will become clearer. Anyway, now we have our PVC, let’s go ahead and create it, and make sure that it uses our PV created earlier.
$ kubectl create -f demo-pvc.yaml persistentvolumeclaim/demo-pv created $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE demo-pvc Bound demo-pv 2Gi RWO demo 4s
This looks good. We can see in the volume field of the PVC that it has used our PV, demo-pv. Great. The final step now is to create a simple Pod that uses this storage. This is done by adding the PVC details to the Pod YAML. In this example, I am going to create a very simple busybox which I will keep alive through a simple sleep command, but which at the same time will attach and mount the persistent volume. The volume will be mounted inside the busybox container on /demo, but the volume itself (demo-vol) will be provided via the persistent volume claim called demo-pvc. And we have already created this PVC to consume the physical storage referenced by our PV. Here is the sample YAML:
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
containers:
- name: busybox
image: "k8s.gcr.io/busybox”
volumeMounts:
- name: demo-vol
mountPath: "/demo”
command: [ "sleep", "1000000” ]
volumes:
- name: demo-vol
persistentVolumeClaim:
claimName: demo-pvc
Let’s deploy that Pod, and see what we get. I put a command to keep the container around while we do some looking around.
$ kubectl create -f demo-pod.yaml pod/demo-pod created $ kubectl get pod NAME READY STATUS RESTARTS AGE demo-pod 1/1 Running 0 59s $ kubectl describe pod demo-pod Name: demo-pod Namespace: demo Priority: 0 PriorityClassName: <none> Node: 0564ff11-452e-4d9c-bd4a-976408778eb1/10.27.51.190 Start Time: Thu, 30 May 2019 14:29:02 +0100 Labels: <none> Annotations: <none> Status: Running IP: 10.200.87.35 Containers: busybox: Container ID: docker://18f3baf65a6846a92e76b61e0ca1712a2addbc137acad9f406e5773213907f96 Image: k8s.gcr.io/busybox Image ID: docker-pullable://k8s.gcr.io/busybox@sha256:d8d3bc2c183ed2f9f10e7258f84971202325ee6011ba137112e01e30f206de67 Port: <none> Host Port: <none> Command: sleep 1000000 State: Running Started: Thu, 30 May 2019 14:29:09 +0100 Ready: True Restart Count: 0 Environment: <none> Mounts: /demo from demo-vol (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-pv9p8 (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: demo-vol: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: demo-pvc ReadOnly: false default-token-pv9p8: Type: Secret (a volume populated by a Secret) SecretName: default-token-pv9p8 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 49s default-scheduler Successfully assigned demo/demo-pod to 0564ff11-452e-4d9c-bd4a-976408778eb1 Normal SuccessfulAttachVolume 48s attachdetach-controller AttachVolume.Attach succeeded for volume "demo-pv" Normal Pulling 43s kubelet, 0564ff11-452e-4d9c-bd4a-976408778eb1 pulling image "k8s.gcr.io/busybox" Normal Pulled 42s kubelet, 0564ff11-452e-4d9c-bd4a-976408778eb1 Successfully pulled image "k8s.gcr.io/busybox" Normal Created 42s kubelet, 0564ff11-452e-4d9c-bd4a-976408778eb1 Created container Normal Started 42s kubelet, 0564ff11-452e-4d9c-bd4a-976408778eb1 Started container
In the describe output, we can see the reference to the Persistent Volume Claim. We can also see the volume demo-pv getting attached to the Pod in the events at the end of the describe output. However, we should really login to that container, and see if we can see a 2GiB volume mounted on /demo.
$ kubectl get pod NAME READY STATUS RESTARTS AGE demo-pod 1/1 Running 0 3m58s $ kubectl exec -it demo-pod /bin/sh / # df -h Filesystem Size Used Available Use% Mounted on overlay 49.1G 5.1G 41.4G 11% / tmpfs 64.0M 0 64.0M 0% /dev tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/sdd 1.9G 3.0M 1.9G 0% /demo /dev/sda1 2.9G 1.3G 1.4G 49% /dev/termination-log /dev/sdc1 49.1G 5.1G 41.4G 11% /etc/resolv.conf /dev/sdc1 49.1G 5.1G 41.4G 11% /etc/hostname /dev/sda1 2.9G 1.3G 1.4G 49% /etc/hosts shm 64.0M 0 64.0M 0% /dev/shm tmpfs 7.8G 12.0K 7.8G 0% /tmp/secrets/kubernetes.io/serviceaccount tmpfs 7.8G 0 7.8G 0% /proc/acpi tmpfs 64.0M 0 64.0M 0% /proc/kcore tmpfs 64.0M 0 64.0M 0% /proc/keys tmpfs 64.0M 0 64.0M 0% /proc/timer_list tmpfs 64.0M 0 64.0M 0% /proc/sched_debug tmpfs 7.8G 0 7.8G 0% /proc/scsi tmpfs 7.8G 0 7.8G 0% /sys/firmware / #
Yep – there it is, presented on /dev/sdd. Very good.
Now you are probably thinking that this is a very convoluted way of provisioning storage to a container, and for statically provisioned volumes like this, I would agree. However, you will see the real power of Persistent Volume Claims when we introduce the topic of StorageClasses in more detail in the next post. I will also show you how storage classes can be used to consume underlying vSphere storage policies (e.g. vSAN) in that upcoming post.
Manifests used in this demo can be found on my vsphere-storage-101 github repo.
You are an awesome teacher! Thanks for explaining with examples.
Ultimate Cormac. I have started learning Kubernetes PODs on vSAN. Expecting more deep dive on this.