Provisioning Capacity in GKE Autopilot

2 min read

I previously documented how to add spare capacity to an Autopilot Kubernetes cluster, whereby you create a balloon Deployment to provision some scheduling headroom. This works to constantly give you a certain amount of headroom, so for example if you have a 2vCPU balloon Deployment, and use that capacity it will get rescheduled. A useful concept to add some rapid scaling capabilities to Autopilot.

What if instead of rapid-scaling headroom, you want to provision a set amount of capacity, perhaps in anticipation of an event that will require a major scale-up?

This can be achieved using the same low-priority balloon pod technique, however instead of wrapping them in a Deployment, we’ll use a Job instead to provision some capacity as a one-off.

Firstly, we need a low priority “balloon” class, and a higher priority default class

kind: PriorityClass
  name: balloon-priority
value: -10
preemptionPolicy: Never
globalDefault: false
description: "Balloon pod priority."
kind: PriorityClass
  name: default-priority
value: 0
preemptionPolicy: PreemptLowerPriority
globalDefault: true
description: "The global default priority. Will preempt the balloon pods."


And then our balloon Job definition that will provision 4x 32-core nodes of capacity once (giving you roughly 124 vCPU of allocatable capacity, due to some overhead), and reserve it for up to 10 hours.

apiVersion: batch/v1
kind: Job
  name: balloon-capacity
  parallelism: 4
  backoffLimit: 0
      priorityClassName: balloon-priority
      terminationGracePeriodSeconds: 0
      - name: ubuntu-container
        image: ubuntu
        command: ["sleep"]
        args: ["36000"]
              cpu: "16"
      restartPolicy: Never


To deploy:

kubectl create -f

kubectl create -f

How does it work? You’ll notice the pod spec is the same as for balloon pods before, except that instead of sleeping forever, the pod will sleep for 36000 seconds (10 hours) then terminate. Set this time out to be the length of time you need the capacity. This is a nice feature, as it means you won’t accidently keep the capacity provisioned for longer than you need it. One thing to note is that the time here isn’t a guarantee, be sure to set a maintenance window. on your cluster, otherwise the Pod (and associated capacity) could be removed due to an update. (Read more about minimizing Pod disruption).

Importantly, the job has a backoffLimit of 0, which means any preempted Pod will not be rescheduled (remember: our goal is one-off capacity), and the parallelism simply dictates how many replicas we’ll get.

The other trick here is the 16vCPU allocation, I mentioned that this Job gets results in 32-core nodes being provisioned today, but the Job is only requesting 16, so how do that work? Due to the current nodes being provisioned under the hood in Autopilot, this actually results in 32 cores of capacity (allocatable will be slightly less) being made available, for the price of 16. Neat! (Note that this implementation detail of Autopilot is subject to change).

This example is designed to be a real-world example of providing 4x of 32vCPU capacity. If you want to run a cheaper test just to see how it works, drop the CPU requests, and parallelism down.