Do you need to provision a whole bunch of ephemeral storage to your Autopilot Pods? For example, as part of a data processing pipeline? In the past with Kubernetes, you might have used emptyDir
as a way to allocate a bunch of storage (taken from the node’s boot disk) to your containers. This however requires that you carefully plan your node’s boot disks to ensure the disk has enough total storage to handle the Pods you plan to schedule, and may result in overprovisioning (i.e. unused storage capacity) if the Pods that end up being scheduled don’t use it all. Lots of inflexible, upfront planning combined with potential wastage 🤮.
Update: this is now documented in GKE.
Autopilot, thankfully, eliminates node-based planning—but then, how can you secure a huge ephemeral volume for your Pod’s scratch space if you can’t choose your node’s boot disk size? Fortunately, Kubernetes 1.23 introduced the generic ephemeral volume mount, which makes this a breeze 🎉.
Ephemeral volumes allow us to mount a persistent volume into the container that persists for the life of the Pod. Being an attached volume, you can allocate huge amounts of storage (up to 64Ti on Google Cloud), independent of the node boot disk, or what other pods on the node are doing! You can tune the storage class for this volume as well, which in Google Cloud gives you a few different options, like the high-performance SSD (“pd-ssd”) and the mid-ranged Balanced SSD (“pd-balanced”).
In this example, I will demonstrate how you would mount a 1TiB SSD ephemeral volume into a Pod running on Autopilot.
In Autopilot, everything is configured automatically, and is ready for use. If you use GKE Standard and created your cluster on an older version (prior to 1.20), you can follow these steps to enable the driver. Newer versions of GKE Standard also have this enabled by default.
First, we need to define our storage class, this is where we specify the type of PD that we want. The different options are defined here, in our case let’s use pd-ssd
which offers the highest performance.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ssd
provisioner: pd.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
type: pd-ssd
apiVersion: apps/v1
kind: Deployment
metadata:
name: ephemeral-example
spec:
replicas: 1
selector:
matchLabels:
pod: example-pod
strategy:
type: Recreate
template:
metadata:
labels:
pod: example-pod
spec:
containers:
- name: busybox-container
image: busybox
resources:
requests:
cpu: 500m
memory: 2Gi
ephemeral-storage: 2Gi
volumeMounts:
- mountPath: "/scratch"
name: scratch-volume
command: [ "sleep", "1000000" ]
volumes:
- name: scratch-volume
ephemeral:
volumeClaimTemplate:
metadata:
labels:
type: scratch-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "ssd"
resources:
requests:
storage: 1Ti
Create both resources:
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/ephemeral-volume/storage-class-ssd.yaml
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/ephemeral-volume/ephemeral-example.yaml
To verify that things worked, let’s shell into the pod, and run df -h
. As you can see, we have a 1TiB disk mounted at /scratch
.
$ kubectl exec -it deploy/ephemeral-example -- sh
/ # df -h
Filesystem Size Used Available Use% Mounted on
overlay 94.3G 3.6G 90.6G 4% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sdb 1006.9G 28.0K 1006.8G 0% /scratch
/dev/sda1 94.3G 3.6G 90.6G 4% /etc/hosts
/dev/sda1 94.3G 3.6G 90.6G 4% /dev/termination-log
/dev/sda1 94.3G 3.6G 90.6G 4% /etc/hostname
/dev/sda1 94.3G 3.6G 90.6G 4% /etc/resolv.conf
shm 64.0M 0 64.0M 0% /dev/shm
tmpfs 2.0G 12.0K 2.0G 0% /var/run/secrets/kubernetes.io/serviceaccount
tmpfs 1.9G 0 1.9G 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 1.9G 0 1.9G 0% /proc/scsi
tmpfs 1.9G 0 1.9G 0% /sys/firmware
Now let’s compare the write performance. In theory, our /scratch
volume should be faster since it’s based on the SSD PD. As we can see from this simple test, we get roughly 2x faster writes on the SSD ephemeral volume writing 40x 50MB worth of data. Nice!
# dd if=/dev/zero of=/tmp/test1.img bs=50M count=40 oflag=direct
40+0 records in
40+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 17.4605 s, 120 MB/s
# dd if=/dev/zero of=/scratch/test1.img bs=50M count=40 oflag=direct
40+0 records in
40+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 8.29781 s, 253 MB/s
Finally, for fun let’s scale it up and then delete to see our ephemeral resources. As you can see, these disks only stick around for the duration of the Pod (which is the whole point).
$ kubectl scale deploy/ephemeral-example --replicas 10
deployment.apps/ephemeral-example scaled
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ephemeral-example-649fbddfb8-8rq4s 1/1 Running 0 5m45s
ephemeral-example-649fbddfb8-d2f2n 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-g8xqs 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-kkvn7 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-mt4jb 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-npxn7 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-pwcsn 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-s9cv7 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-t4ppr 1/1 Running 0 2m37s
ephemeral-example-649fbddfb8-wlw8v 1/1 Running 0 2m37s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ephemeral-example-5c7f74dd84-5slbb-scratch-volume Bound pvc-c2eeb65e-1608-4131-b87f-9c00303a05e0 1Ti RWO ssd 22m
ephemeral-example-5c7f74dd84-c972t-scratch-volume Bound pvc-48c2e8e4-cbcd-439a-b2dd-d2b824b5c392 1Ti RWO ssd 17m
ephemeral-example-5c7f74dd84-dr4ng-scratch-volume Bound pvc-ecf724d1-1095-4213-8d57-3067f9cea5f6 1Ti RWO ssd 14m
ephemeral-example-649fbddfb8-8rq4s-scratch-volume Bound pvc-435e3b92-e925-4bfc-99c6-be55a2665f57 1Ti RWO ssd 5m47s
ephemeral-example-649fbddfb8-d2f2n-scratch-volume Bound pvc-f359fdef-ae60-46dd-8d45-b7b578650daa 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-g8xqs-scratch-volume Bound pvc-888aff7d-16ba-42c5-91e6-16ffa5647e21 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-kkvn7-scratch-volume Bound pvc-26afd18b-f039-41b3-9927-81aa39bd7d1c 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-mt4jb-scratch-volume Bound pvc-893208c5-4677-4b42-9fba-d987157c0e04 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-npxn7-scratch-volume Bound pvc-e9fc41b3-818b-42df-9418-994222298695 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-pwcsn-scratch-volume Bound pvc-8487d270-beb8-4842-b107-dff8f0862d70 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-s9cv7-scratch-volume Bound pvc-c37dd9ce-33b3-4010-ae5f-938e4c033d13 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-t4ppr-scratch-volume Bound pvc-ba856fac-6ea3-4cf7-98b5-2aa599870d48 1Ti RWO ssd 2m40s
ephemeral-example-649fbddfb8-wlw8v-scratch-volume Bound pvc-98ef6aac-93b4-4495-a359-2fc717729a09 1Ti RWO ssd 2m40s
$ kubectl delete deploy/ephemeral-example
deployment.apps "ephemeral-example" deleted
$ kubectl get pvc
No resources found in default namespace.