Spot Pods are a great way to save money on Autopilot, currently 70% off the regular price. The catch is two-fold:
- Your workload can be disrupted
- There may not always be spot capacity available
For workload disruption, this is simply a judgement call. You should only run workloads that can accept disruption (abrupt termination). If you have a batch job that would lose hours of work, it’s not a good fit. Generally StatefulSet shouldn’t be run on spot compute, as most stateful workloads are less tolerant to disruption.
To solve the capacity issue, a common request is to prefer spot compute, but not require it. That is, to use spot when available and fallback to regular capacity when it’s not.
(One word of caution: this technique is not advised for really critical workloads, as there may be correlation between the spot capacity being reclaimed, and a stockout of capacity in the zone. This is somewhat mitigated by the fact that Autopilot can provision nodes in any of the 3 zones of the region, but it remains not recommended to run critical applications in this way).
In Autopilot, spot compute is requested via a nodeSelector or node affinity. This opens the door to using preferredDuringSchedulingIgnoredDuringExecution
to preference spot, however there is a catch: the Kubernetes scheduler will run your pod on an existing non-spot node if it has spare capacity, before Autopilot will add a spot node. To prevent this problem, we can combine a preferred node affinity, with workload separation. With this, your Spot Pods won’t run on available non-spot capacity in the cluster, rather a spot node will be provisioned (and only if there is no spot capacity, would a regular node be created instead).
apiVersion: apps/v1
kind: Deployment
metadata:
name: spotexample
spec:
replicas: 10
selector:
matchLabels:
pod: spotexample-pod
template:
metadata:
labels:
pod: spotexample-pod
spec:
tolerations:
- key: group
operator: Equal
value: spot-preferred
effect: NoSchedule
nodeSelector:
group: spot-preferred
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: cloud.google.com/gke-spot
operator: In
values:
- "true"
containers:
- name: timeserver-container
image: docker.io/wdenniss/timeserver:1
In the event that there are no Spot nodes available, non-Spot nodes will be created, which is the value of preferring, but not requiring spot. Due to the way the cluster scaling works, however, these non-Spot nodes will likely stay around for the duration of the workload even after Spot capacity is available once again, and even following events like updates and workload scale-up/down (which may not be ideal). If you have a workload in this situation and want to get it back onto 100% Spot nodes, simply change the value of the toleration and nodeSelector to a new one, e.g. in the example above changing spot-preferred
to spot-preferred2
, and you’ll get a fresh set of nodes (Spot ones, if available).