SSD Ephemeral Storage on Autopilot

4 min read

Do you need to provision a whole bunch of ephemeral storage to your Autopilot Pods? For example, as part of a data processing pipeline? In the past with Kubernetes, you might have used emptyDir as a way to allocate a bunch of storage (taken from the node’s boot disk) to your containers. This however requires that you carefully plan your node’s boot disks to ensure the disk has enough total storage to handle the Pods you plan to schedule, and may result in overprovisioning (i.e. unused storage capacity) if the Pods that end up being scheduled don’t use it all. Lots of inflexible, upfront planning combined with potential wastage 🤮.

Autopilot, thankfully, eliminates node-based planning—but then, how can you secure a huge ephemeral volume for your Pod’s scratch space if you can’t choose your node’s boot disk size? Fortunately, Kubernetes 1.23 introduced the generic ephemeral volume mount, which makes this a breeze 🎉.

Ephemeral volumes allow us to mount a persistent volume into the container that persists for the life of the Pod. Being an attached volume, you can allocate huge amounts of storage (up to 64Ti on Google Cloud), independent of the node boot disk, or what other pods on the node are doing! You can tune the storage class for this volume as well, which in Google Cloud gives you a few different options, like the high-performance SSD (“pd-ssd”) and the mid-ranged Balanced SSD (“pd-balanced”).

In this example, I will demonstrate how you would mount a 1TiB SSD ephemeral volume into a Pod running on Autopilot.

First, we need to define our storage class, this is where we specify the type of PD that we want (in our case, pd-ssd which has one of the highest performance). Now if you’re using Standard you need to do some work to enable the driver, naturally in Autopilot this is done for you so we can proceed directly to creating the Storage Class (the various type options are defined here).

kind: StorageClass
  name: ssd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
  type: pd-ssd


apiVersion: apps/v1
kind: Deployment
  name: ephemeral-example
  replicas: 1
      pod: example-pod
    type: Recreate      
        pod: example-pod
        - name: busybox-container
          image: busybox
              cpu: 500m
              memory: 2Gi
              ephemeral-storage: 2Gi
          - mountPath: "/scratch"
            name: scratch-volume
          command: [ "sleep", "1000000" ]
        - name: scratch-volume
                  type: scratch-volume
                accessModes: [ "ReadWriteOnce" ]
                storageClassName: "ssd"
                    storage: 1Ti


Create both resources:

To verify that things worked, let’s shell into the pod, and run df -h. As you can see, we have a 1TiB disk mounted at /scratch.

$ kubectl exec -it deploy/ephemeral-example -- sh
/ # df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                  94.3G      3.6G     90.6G   4% /
tmpfs                    64.0M         0     64.0M   0% /dev
tmpfs                     1.9G         0      1.9G   0% /sys/fs/cgroup
/dev/sdb               1006.9G     28.0K   1006.8G   0% /scratch
/dev/sda1                94.3G      3.6G     90.6G   4% /etc/hosts
/dev/sda1                94.3G      3.6G     90.6G   4% /dev/termination-log
/dev/sda1                94.3G      3.6G     90.6G   4% /etc/hostname
/dev/sda1                94.3G      3.6G     90.6G   4% /etc/resolv.conf
shm                      64.0M         0     64.0M   0% /dev/shm
tmpfs                     2.0G     12.0K      2.0G   0% /var/run/secrets/
tmpfs                     1.9G         0      1.9G   0% /proc/acpi
tmpfs                    64.0M         0     64.0M   0% /proc/kcore
tmpfs                    64.0M         0     64.0M   0% /proc/keys
tmpfs                    64.0M         0     64.0M   0% /proc/timer_list
tmpfs                     1.9G         0      1.9G   0% /proc/scsi
tmpfs                     1.9G         0      1.9G   0% /sys/firmware

Now let’s compare the write performance. In theory, our /scratch volume should be faster since it’s based on the SSD PD. As we can see from this simple test, we get roughly 2x faster writes on the SSD ephemeral volume writing 40x 50MB worth of data. Nice!

# dd if=/dev/zero of=/tmp/test1.img bs=50M count=40 oflag=direct
40+0 records in
40+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 17.4605 s, 120 MB/s

# dd if=/dev/zero of=/scratch/test1.img bs=50M count=40 oflag=direct
40+0 records in
40+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 8.29781 s, 253 MB/s

Finally, for fun let’s scale it up and then delete to see our ephemeral resources. As you can see, these disks only stick around for the duration of the Pod (which is the whole point).

$ kubectl scale deploy/ephemeral-example --replicas 10
deployment.apps/ephemeral-example scaled
$ kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
ephemeral-example-649fbddfb8-8rq4s   1/1     Running   0          5m45s
ephemeral-example-649fbddfb8-d2f2n   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-g8xqs   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-kkvn7   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-mt4jb   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-npxn7   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-pwcsn   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-s9cv7   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-t4ppr   1/1     Running   0          2m37s
ephemeral-example-649fbddfb8-wlw8v   1/1     Running   0          2m37s
$ kubectl get pvc
NAME                                                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ephemeral-example-5c7f74dd84-5slbb-scratch-volume   Bound    pvc-c2eeb65e-1608-4131-b87f-9c00303a05e0   1Ti        RWO            ssd            22m
ephemeral-example-5c7f74dd84-c972t-scratch-volume   Bound    pvc-48c2e8e4-cbcd-439a-b2dd-d2b824b5c392   1Ti        RWO            ssd            17m
ephemeral-example-5c7f74dd84-dr4ng-scratch-volume   Bound    pvc-ecf724d1-1095-4213-8d57-3067f9cea5f6   1Ti        RWO            ssd            14m
ephemeral-example-649fbddfb8-8rq4s-scratch-volume   Bound    pvc-435e3b92-e925-4bfc-99c6-be55a2665f57   1Ti        RWO            ssd            5m47s
ephemeral-example-649fbddfb8-d2f2n-scratch-volume   Bound    pvc-f359fdef-ae60-46dd-8d45-b7b578650daa   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-g8xqs-scratch-volume   Bound    pvc-888aff7d-16ba-42c5-91e6-16ffa5647e21   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-kkvn7-scratch-volume   Bound    pvc-26afd18b-f039-41b3-9927-81aa39bd7d1c   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-mt4jb-scratch-volume   Bound    pvc-893208c5-4677-4b42-9fba-d987157c0e04   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-npxn7-scratch-volume   Bound    pvc-e9fc41b3-818b-42df-9418-994222298695   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-pwcsn-scratch-volume   Bound    pvc-8487d270-beb8-4842-b107-dff8f0862d70   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-s9cv7-scratch-volume   Bound    pvc-c37dd9ce-33b3-4010-ae5f-938e4c033d13   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-t4ppr-scratch-volume   Bound    pvc-ba856fac-6ea3-4cf7-98b5-2aa599870d48   1Ti        RWO            ssd            2m40s
ephemeral-example-649fbddfb8-wlw8v-scratch-volume   Bound    pvc-98ef6aac-93b4-4495-a359-2fc717729a09   1Ti        RWO            ssd            2m40s
$ kubectl delete deploy/ephemeral-example
deployment.apps "ephemeral-example" deleted
$ kubectl get pvc
No resources found in default namespace.