Provisioning one-off spare capacity for GKE Autopilot

2 min read

I previously documented how to add spare capacity to an Autopilot Kubernetes cluster, whereby you create a placeholder Deployment to provision some scheduling headroom. This works to constantly give you a certain amount of headroom, so for example if you have a 2vCPU placeholder (a.k.a. balloon) Deployment, and use that capacity it will get rescheduled. A useful concept to add some rapid scaling capabilities to Autopilot.

Update: GKE now has an official guide for provisioning spare capacity.

What if instead of rapid-scaling headroom, you want to provision a set amount of capacity, perhaps in anticipation of an event that will require a major scale-up?

This can be achieved using the same low-priority placeholder pod technique, however instead of wrapping them in a Deployment, we’ll use a Job instead to provision some capacity as a one-off.

Firstly, we need a low priority placeholder class, and a higher priority default class

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: placeholder-priority
value: -10
preemptionPolicy: Never
globalDefault: false
description: "Placeholder Pod priority."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: default-priority
value: 0
preemptionPolicy: PreemptLowerPriority
globalDefault: true
description: "The global default priority. Will preempt the placeholder Pods."

priority-classes.yaml

And then our placeholder Job definition that will provision 4x 32-core nodes of capacity once (giving you roughly 124 vCPU of allocatable capacity, due to some overhead), and reserve it for up to 10 hours.

apiVersion: batch/v1
kind: Job
metadata:
  name: placeholder-capacity
spec:
  parallelism: 4
  backoffLimit: 0
  template:
    spec:
      priorityClassName: placeholder-priority
      terminationGracePeriodSeconds: 0
      containers:
      - name: ubuntu-container
        image: ubuntu
        command: ["sleep"]
        args: ["36000"]
        resources:
            requests:
              cpu: "16"
      restartPolicy: Never

placeholder-capacity.yaml

To deploy:

kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/placeholder-job/priority-classes.yaml

kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/placeholder-job/placeholder-capacity.yaml

How does it work? You’ll notice the pod spec is the same as for placeholder Pods before, except that instead of sleeping forever, the pod will sleep for 36000 seconds (10 hours) then terminate. Set this time out to be the length of time you need the capacity. This is a nice feature, as it means you won’t accidently keep the capacity provisioned for longer than you need it.

The job has a backoffLimit of 0, which means any preempted Pod will not be rescheduled (remember: our goal is one-off capacity), and the parallelism simply dictates how many replicas we’ll get.

Since this is one-off capacity, it’s important to set a maintenance window. on your cluster, otherwise the Pod (and associated capacity) could be evicted due to an update and thus the capacity won’t be there when you need it. (Read more about minimizing Pod disruption). Also note that if you scale up and use this capacity, then scale back down, the capacity will be relinquished (by design). If you are expecting multiple scale-up/down events, then placeholder Pods in a Deployment may work better.

The other trick here is the 16vCPU allocation, I mentioned that this Job gets results in 32-core nodes being provisioned today, but the Job is only requesting 16, so how do that work? Due to the current nodes being provisioned under the hood in Autopilot, this actually results in 32 cores of capacity (allocatable will be slightly less) being made available, for the price of 16. Neat! (Note that this implementation detail of Autopilot is subject to change).

This example is designed to be a real-world example of providing 4x of 32vCPU capacity. If you want to run a cheaper test just to see how it works, drop the CPU requests, and parallelism down.