- Your workload can be disrupted
- There may not always be spot capacity available
For workload disruption, this is simply a judgement call. You should only run workloads that can accept disruption (abrupt termination). If you have a batch job that would lose hours of work, it’s not a good fit. Generally StatefulSet shouldn’t be run on spot compute, as most stateful workloads are less tolerant to disruption.
To solve the capacity issue, a common request is to prefer spot compute, but not require it. That is, to use spot when available and fallback to regular capacity when it’s not.
(One word of caution: this technique is not advised for really critical workloads, as there may be correlation between the spot capacity being reclaimed, and a stockout of capacity in the zone. This is somewhat mitigated by the fact that Autopilot can provision nodes in any of the 3 zones of the region, but it remains not recommended to run critical applications in this way).
In Autopilot, spot compute is requested via a nodeSelector or node affinity. This opens the door to using
preferredDuringSchedulingIgnoredDuringExecution to preference spot, however there is a catch: the Kubernetes scheduler will run your pod on an existing non-spot node if it has spare capacity, before Autopilot will add a spot node. To prevent this problem, we can combine a preferred node affinity, with workload separation. With this, your Spot Pods won’t run on available non-spot capacity in the cluster, rather a spot node will be provisioned (and only if there is no spot capacity, would a regular node be created instead).
apiVersion: apps/v1 kind: Deployment metadata: name: spotexample spec: replicas: 10 selector: matchLabels: pod: spotexample-pod template: metadata: labels: pod: spotexample-pod spec: tolerations: - key: group operator: Equal value: spot-preferred effect: NoSchedule nodeSelector: group: spot-preferred affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: cloud.google.com/gke-spot operator: In values: - "true" containers: - name: timeserver-container image: docker.io/wdenniss/timeserver:1
In the event that there are no Spot nodes available, non-Spot nodes will be created, which is the value of preferring, but not requiring spot. Due to the way the cluster scaling works, however, these non-Spot nodes will likely stay around for the duration of the workload even after Spot capacity is available once again, and even following events like updates and workload scale-up/down (which may not be ideal). If you have a workload in this situation and want to get it back onto 100% Spot nodes, simply change the value of the toleration and nodeSelector to a new one, e.g. in the example above changing
spot-preferred2, and you’ll get a fresh set of nodes (Spot ones, if available).