# HA 3-zone Deployments with PodSpreadTopology on Autopilot ![](HA-Pods.png) [PodSpreadTopology](https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/) is a way to get Kubernetes to spread out your pods across a failure domain, typically nodes or zones. Kubernetes platforms typically have some default spread built in, although it may not be as aggressive as you want (meaning, it might be more tolerant of imbalanced spread). Here’s an example Deployment with a PodSpreadToplogy that will result in an even spread over all zones (I also have a writeup on this topic which you can [preview in my book](https://livebook.manning.com/book/kubernetes-for-developers/chapter-8/v-11/87)). ```yaml {hl_lines=[15, 16, 17, 18, 19, 20, 21]} apiVersion: apps/v1 kind: Deployment metadata: name: timeserver spec: replicas: 3 selector: matchLabels: pod: timeserver-pod template: metadata: labels: pod: timeserver-pod spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: pod: timeserver-pod containers: - name: timeserver-container image: docker.io/wdenniss/timeserver:1 resources: requests: cpu: 200m memory: 250Mi limits: cpu: 300m memory: 400Mi ``` [ha-deployment.yaml](https://github.com/WilliamDenniss/autopilot-examples/blob/master/all-zones/ha-deployment.yaml) Since Autopilot clusters are regional, typically involving 3 zones, you might expect such a spread topology rule would result in pods being scheduled to all 3 zones, however this isn’t always the case. Autopilot by default will use 2 zones for HA, and this is a moment where the way the layers of the GKE/Kubernetes system interact in an uncooperative way. PodSpreadTopology is a Kubernetes concept, so if your cluster only has nodes in 2 zones, the scheduler only sees 2 zones, it will spread over those 2, but not the third which it doesn’t know about. To fix that you simply need to ensure that you have at least 1 Pod running in each of the 3 zones in your region (see also [Using GKE Autopilot in specific zones](/k8s/autopilot-specific-zones/)). There’s a couple of easy ways to make that the case, and you can do this either once to *bootstrap* the zones, or with a more permanent Deployment. The former is cheaper (runs once), and works the same provided you scale up your workload before it terminates, and you don’t scale down below the number of zones you have (i.e. you always have at least 3 replicas—which is the stated goal anyway!). ## Ensuring you are using all 3 zones in Autopilot To bootstrap Autopilot in 3 zones so the PodSpreadTopology can do it’s thing, we’re going to run a “pause” Job in 3 zones, for a duration of 5h (18000 seconds), meaning you’ll have 5 hours to complete your deployment (change the time as needed). This sample was built for us-west1, my own favorite region. You can run it as-is in that region, otherwise make sure to **find and replace “us-west1” with your desired region** before use, otherwise your Pods will not schedule. ```yaml {hl_lines=[11, 35, 59]} apiVersion: batch/v1 kind: Job metadata: name: placeholder-job-zone-a spec: parallelism: 1 backoffLimit: 0 template: spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-a" terminationGracePeriodSeconds: 0 containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["18000"] resources: requests: cpu: 250m memory: 512Mi ephemeral-storage: 10Mi restartPolicy: Never --- apiVersion: batch/v1 kind: Job metadata: name: placeholder-job-zone-b spec: parallelism: 1 backoffLimit: 0 template: spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-b" terminationGracePeriodSeconds: 0 containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["18000"] resources: requests: cpu: 250m memory: 512Mi ephemeral-storage: 10Mi restartPolicy: Never --- apiVersion: batch/v1 kind: Job metadata: name: placeholder-job-zone-c spec: parallelism: 1 backoffLimit: 0 template: spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-c" terminationGracePeriodSeconds: 0 containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["18000"] resources: requests: cpu: 250m memory: 512Mi ephemeral-storage: 10Mi restartPolicy: Never ``` [zonal-placeholder-job.yaml](https://github.com/WilliamDenniss/autopilot-examples/blob/master/all-zones/zonal-placeholder-job.yaml) To deploy: ```shell kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/master/all-zones/zonal-placeholder-job.yaml ``` Now it’s deployed we can check that it worked, and we have a pod in each of our 3 zones. ```shell wdenniss@cloudshell:~/autopilot-examples/all-zones (gke-autopilot-test)$ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES placeholder-job-zone-a-bhdw7 1/1 Running 0 27m 10.67.1.2 gk3-autopilot-cluster-1-nap-7sxf27zy-adbad4ea-zv4k placeholder-job-zone-b-5njdw 1/1 Running 0 27m 10.67.0.139 gk3-autopilot-cluster-1-nap-7sxf27zy-84b15ccd-gjcs placeholder-job-zone-c-z96rz 1/1 Running 0 27m 10.67.0.75 gk3-autopilot-cluster-1-nap-7sxf27zy-98e28c49-r2m9 wdenniss@cloudshell:~/autopilot-examples/all-zones (gke-autopilot-test)$ kubectl describe node gk3-autopilot-cluster-1-nap-7sxf27zy-adbad4ea-zv4k | grep zone failure-domain.beta.kubernetes.io/zone=us-west1-a topology.gke.io/zone=us-west1-a topology.kubernetes.io/zone=us-west1-a default placeholder-job-zone-a-bhdw7 250m (12%) 250m (12%) 512Mi (8%) 512Mi (8%) 27m wdenniss@cloudshell:~/autopilot-examples/all-zones (gke-autopilot-test)$ kubectl describe node gk3-autopilot-cluster-1-nap-7sxf27zy-84b15ccd-gjcs | grep zone failure-domain.beta.kubernetes.io/zone=us-west1-b topology.gke.io/zone=us-west1-b topology.kubernetes.io/zone=us-west1-b default placeholder-job-zone-b-5njdw 250m (12%) 250m (12%) 512Mi (8%) 512Mi (8%) 27m wdenniss@cloudshell:~/autopilot-examples/all-zones (gke-autopilot-test)$ kubectl describe node gk3-autopilot-cluster-1-nap-7sxf27zy-98e28c49-r2m9 | grep zone failure-domain.beta.kubernetes.io/zone=us-west1-c topology.gke.io/zone=us-west1-c topology.kubernetes.io/zone=us-west1-c default placeholder-job-zone-c-z96rz 250m (12%) 250m (12%) 512Mi (8%) 512Mi (8%) 27m ``` The catch with using a 5 hour Job here, is that you need to schedule your actual workload (at least 1 replica per zone) in that time, otherwise you’re back to square one. For bonus marks you could also make this Job preempted by your main workload so it gets evicted right away (by adding a priorityClassName, like we did [here](/k8s/autopilot-capacity-reservation/)). If rather than the 5h window, you prefer a more permanent setup, then instead of a Job, you can use a Deployment. This method guarantees you always have 3 zones, and will set you back 750mCPU and 1536MB worth of costs. Again, **find and replace us-west1 with your own region**. ```yaml {hl_lines=[16, 42, 68]} apiVersion: apps/v1 kind: Deployment metadata: name: placeholder-zone-a spec: replicas: 1 selector: matchLabels: pod: placeholder-zone-a-pod template: metadata: labels: pod: placeholder-zone-a-pod spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-a" containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["infinity"] resources: requests: cpu: 250m memory: 250Mi --- apiVersion: apps/v1 kind: Deployment metadata: name: placeholder-zone-b spec: replicas: 1 selector: matchLabels: pod: placeholder-zone-b-pod template: metadata: labels: pod: placeholder-zone-b-pod spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-b" containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["infinity"] resources: requests: cpu: 250m memory: 250Mi --- apiVersion: apps/v1 kind: Deployment metadata: name: placeholder-zone-c spec: replicas: 1 selector: matchLabels: pod: placeholder-zone-c-pod template: metadata: labels: pod: placeholder-zone-c-pod spec: nodeSelector: topology.kubernetes.io/zone: "us-west1-c" containers: - name: ubuntu-container image: ubuntu command: ["sleep"] args: ["infinity"] resources: requests: cpu: 250m memory: 512Mi ephemeral-storage: 10Mi ``` [zonal-placeholder-deployment.yaml](https://github.com/WilliamDenniss/autopilot-examples/blob/master/all-zones/zonal-placeholder-deployment.yaml)