Using Node-based pricing on GKE Autopilot

4 min read

New this year, Autopilot now has two pricing models: the original Pod-based model, and the new node-based option. The pricing page does a pretty good job of explaining the difference (at least I hope it explains it well, as I wrote it), and how best to utilize each option, but here’s a quick recap anyway:

The Pod model is great for when you don’t want to think about or worry about bin-packing of nodes—allow GKE to take of everything. It uses an all-inclusive pricing model where you don’t need to be concerned with underutilized nodes, or odd-shaped workloads. The node model is useful when you have specific hardware requirements (like a particular GPU, or CPU), or have large workloads that bin-pack well and you just want to buy some VMs. The node model is billed by the node at Compute Engine prices, with a small added premium. The node model can work out cheaper, provided you fill the nodes. The most optimal way to use Autopilot is typically with a mix of both strategies, taking advantage of whichever pricing model works better for the workload in question.

Another new feature, Custom Compute Class allows you define your own compute classes, with included priority rules making it easy to configure alternatives to Autopilot’s built-in compute classes.

These two features—the new node-based pricing option, and custom Compute Class—are highly complementary, as we can use them together to create our own node-based equivalents to Autopilot’s built-in Balanced and Scale-Out (and even the default/generic) compute classes. This allows us a convenient workload-driven method to run workloads directly on T2D and N2D VMs using node-based pricing, but still have everything fully managed by GKE.

Here’s how.

Node-based Scale-Out Compute Class

Here’s my take on a custom compute class that creates a node-based version of Autopilot’s Scale-Out compute class:

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: scaleout-nodes
spec:
  priorities:
  - machineFamily: t2d
    minCores: 4
  activeMigration:
    optimizeRulePriority: false
  nodePoolAutoCreation:
    enabled: true

scaleout-nodes.yaml

The built-in Scale-Out compute class is backed by T2D (or T2A for Arm). Recreating this in the node model as you can see is pretty straight forward—simply specify t2d (or t2a for Arm) as the machine family specified in the compute class.

You can leave out the minCores setting, or set it as I have to get one of the pre-determined T2D sizes. How to decide the optimal minimum cores? GKE nodes have overhead, which in the node-based billing model you are paying for. For this reason, it might be worth avoiding the really small like 2 or 1 cores, as so much of the node is wasted. Here I pick 4 as a minimum, but you could go higher if you have larger workloads.

TIP: if 4 cpus sounds like too much for the minimum, then you are probably better off staying with the Pod model.

Node-based Balanced Compute Class

Now let’s build a node-based version of Autopilot’s built-in Balanced compute class:

apiVersion: cloud.google.com/v1
kind: ComputeClass
metadata:
  name: balanced-nodes
spec:
  priorities:
  - machineFamily: n2d
    minCores: 4
  - machineFamily: n2
    minCores: 4
  activeMigration:
    optimizeRulePriority: false
  nodePoolAutoCreation:
    enabled: true

balanced-nodes.yaml

The Balanced compute class of Autopilot is backed by N2 and N2D. So for this compute class, we can specify both options in our priority rules. With priority rules, it will behave slightly differently by exhausting your quota for N2D, before falling back to N2 whereas Autopilot more equally balances the node creation between the two. But the goal of running workloads on N2/N2D is achieved. If you have a preference, simply order accordingly.

There is nothing particularly special about these examples, you can create similar compute classes of your own using any machine family, including E2.

Trying it out

With our new node-based custom Compute Classes defined, we can give it a try. I’ll use the Scale-Out compute class as the example here.

NOTE: your cluster must be running GKE 1.30.3-gke.1451000 or later, which at the time of writing is still pretty new. Be sure to upgrade your cluster before trying this out.

This deployment will create 5 Pods on our node-based scale-out custom compute class.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: timeserver-scaleout
spec:
  replicas: 5
  selector:
    matchLabels:
      pod: timeserver-scaleout-pod
  template:
    metadata:
      labels:
        pod: timeserver-scaleout-pod
    spec:
      nodeSelector:
        cloud.google.com/compute-class: scaleout-nodes
      containers:
      - name: timeserver-container
        image: docker.io/wdenniss/timeserver:1

deploy-scaleout.yaml

With our resources defined, first create the compute classes

kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/master/custom-compute-class/node-classes/scaleout-nodes.yaml

And then the deployment

kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/master/custom-compute-class/node-classes/demo/deploy-scaleout.yaml

Now we can observe the results. We can add a custom column to display the instance type being used by the workload.

kubectl get pods -o wide
kubectl get nodes -o custom-columns="NAME:.metadata.name,INSTANCE_TYPE:.metadata.labels.node\.kubernetes\.io/instance-type"

Here’s what I got from my sample run:

NAME                                  READY   STATUS    RESTARTS   AGE     IP            NODE                                    NOMINATED NODE   READINESS GATES
timeserver-scaleout-97959c8db-6x97j   1/1     Running   0          5m31s   10.19.0.135   gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   <none>           <none>
timeserver-scaleout-97959c8db-7vqrd   1/1     Running   0          5m31s   10.19.0.134   gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   <none>           <none>
timeserver-scaleout-97959c8db-8fjf7   1/1     Running   0          5m31s   10.19.0.131   gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   <none>           <none>
timeserver-scaleout-97959c8db-wlldw   1/1     Running   0          5m31s   10.19.0.133   gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   <none>           <none>
timeserver-scaleout-97959c8db-zmgtz   1/1     Running   0          5m31s   10.19.0.132   gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   <none>           <none>
NAME                                    INSTANCE_TYPE
gk3-wave-1-nap-rio7p80r-54e97860-htfg   e2-standard-2
gk3-wave-1-nap-vxekt9qa-5cf704f6-j4r6   t2d-standard-4

This workload will be billed at Compute Engine rates for the T2D instances (i.e. the cost of a t2d-standard-4, plus a small Autopilot premium.

Provided you are deploying significant quantities of workloads and can fill up the nodes, this pricing may work out better.

Remember to set appropriate requests and limits!

As you look to optimize your costs, be sure to correctly set your Pod’s resource requests. While not the topic of this blog post, it is the single most important thing you can do to optimize costs, however you run workloads on Kubernetes (whether it’s GKE in Autopilot mode with pod, or node based billing, or Standard mode with old-style node pools). The resource requests govern how resources are allocated, so the more accurate you are, the more optimized your costs will be regardless of the precise model. My book has a whole chapter on how to set appropriate resource requests and limits for your workloads.

Autopilot also now supports Burstable Pods which is another significant way to save. By setting your limits higher than requests, you can opportunistically burst into the capacity reserved by other Pods who aren’t using it. This is available since version 1.30.2-gke.1394000.

Summary

The reason Autopilot has two pricing models is to give you the best of both worlds—neither is intended to be universally superior. Generally, if you have smaller, odd-shaped workloads that don’t warrant meticulous planning, it’s best just to leave them running on the default Pod model. But if you have well defined, larger scale workloads that bin-pack nicely—the node model covered in this blog post can be to your advantage. Custom Compute Class makes it easy to define your own compute classes that use the node model, which developers can then select in their deployments.