Pod affinity is a useful technique in Kubernetes for expressing a requirement that a pod, say with the “reader” role, is co-located with another pod, say with the “writer’ role. You can express that requirement by adding something like the following to the reader pod.
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: pod
operator: In
values:
- writer-pod
topologyKey: "kubernetes.io/hostname"
The catch is, if there is no space on the node, and nothing that can be evicted, this reader pod may never schedule.
In the future, serverless Kubernetes platforms like Autopilot may recognize this state, and automatically move the writer pod to a larger node where the constraint of the reader Pod can be satisfied, but alas that’s not how it works today.
So, for this very specific case of wanting to strictly co-locate 2 pods, how can we do it? One solution, is to use a DaemonSet for one of the pods, and a regular Deployment for the other. We’ll need to layer on a few additional tricks to get this to work though, because we also only want a single reader pod (which we can achieve with pod anti-affinity), and we may wish to run other workloads (so we can separate this particular workload with workload separation).
Putting it all together, we can setup our reader pod as a DaemonSet with workload separation
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: writer
spec:
selector:
matchLabels:
ds: writer-pod
template:
metadata:
labels:
ds: writer-pod
spec:
tolerations:
- key: group
operator: Equal
value: "read_writer_pair"
effect: NoSchedule
nodeSelector:
group: "read_writer_pair"
containers:
- image: "k8s.gcr.io/pause"
name: reader-container
resources:
requests:
cpu: "1"
When scheduled, this will actually not create any instances, until we add another pod onto this workload separation group. Here we will create a Deployment that has an anti-affinity to itself (so only one replica per node). The replica count here will determine the number of Pod-pairs that we get.
apiVersion: apps/v1
kind: Deployment
metadata:
name: reader
spec:
replicas: 2
selector:
matchLabels:
pod: reader-pod
template:
metadata:
labels:
pod: reader-pod
spec:
tolerations:
- key: group
operator: Equal
value: "read_writer_pair"
effect: NoSchedule
nodeSelector:
group: "read_writer_pair"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: pod
operator: In
values:
- reader-pod
topologyKey: "kubernetes.io/hostname"
containers:
- name: reader-container
image: "k8s.gcr.io/pause"
resources:
requests:
cpu: "1"
Since Autopilot takes into account the DaemonSet size when creating nodes, this pattern should always result in a paired deployment. Simply set the replica count of the reader Deployment to be the quantity of pairs you want. There’s no additional cost or overhead to this solution on Autopilot either, workload separation has no fee, just a slightly higher CPU minimum of 500vCPU.
Note that the creation order matters: it’s important that the DaemonSet Pod (writer) is created before the reader (it’s OK if they are created together in rapid succession, but not if the Deployment is created a full minute before the DaemonSet).
Let’s run the demo:
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/strict-pod-colocation/writer.yaml
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/strict-pod-colocation/reader.yaml
watch -d kubectl get pods -o wide
In about a minute or so, you should see 3 pairs of pods deployed on their own nodes.
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
reader-5956f784f9-dfz8t 1/1 Running 0 3m20s 10.51.128.194 gk3-autopilot-cluster-3-nap-y725ql9p-986f4bbc-886z <none> <none>
reader-5956f784f9-gwqwj 1/1 Running 0 3m20s 10.51.128.131 gk3-autopilot-cluster-3-nap-y725ql9p-4cc171f7-9pr7 <none> <none>
reader-5956f784f9-tdgxv 1/1 Running 0 3m20s 10.51.129.5 gk3-autopilot-cluster-3-nap-y725ql9p-de6eb2d5-76q6 <none> <none>
writer-bf5f2 1/1 Running 0 86s 10.51.128.195 gk3-autopilot-cluster-3-nap-y725ql9p-986f4bbc-886z <none> <none>
writer-qzxpc 1/1 Running 0 81s 10.51.129.6 gk3-autopilot-cluster-3-nap-y725ql9p-de6eb2d5-76q6 <none> <none>
writer-svhmv 1/1 Running 0 83s 10.51.128.130 gk3-autopilot-cluster-3-nap-y725ql9p-4cc171f7-9pr7 <none> <none>