GKE Autopilot: how to know if Pending pods will be scheduled

2 min read

GKE Autopilot is pretty magical. You create a cluster just by picking a region and giving it a name, schedule Kubernetes workloads and the compute resources are provisioned automatically.

While Kubernetes is provisioning resources, your Pods will be in the Pending state. This is all well and good, except… there are other reasons that your Pods can be pending.

NAME READY STATUS RESTARTS AGE
example-deploy-84576c8598-7jbqc 1/1 Pending 0 34s
example-deploy-84576c8598-7mbfh 1/1 Pending 0 33s
example-deploy-84576c8598-bhfqt 1/1 Pending 0 34s

Pods that are Pending. Will they be provisioned, or won’t they?

There are plenty of reasons a Pod can be in the Pending status. Kubernetes is a declarative system and allows you to declare a Pod even if it can’t actually be provisioned. One example is if the Pod requires a PersistentVolumeClaim, and no such claim exists. The Pod will sit there in pending waiting for the PersistentVolumeClaim to be created. So how can you tell the difference?

Fortunately there is an event you can look for to disambiguate in Autopilot: TriggeredScaleUp. Schedule your pods, and stream the events for one of your pods:

kubectl get event -w --field-selector involvedObject.name=POD_NAME

If all goes to plan, after the initial FailedScheduling event which indicates there were not enough resources, you will see TriggeredScaleUp which indicates the that Autopilot is creating resources for you. This may be followed by more FailedScheduling events, but fear not, as the resources are being scheduled. Finally, a Scheduled event will be posted when the Pod has the resources it needs.

# kubectl get event -w --field-selector involvedObject.name=example-deploy-f4b9bc45b-m92cj
LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
4m9s        Warning   FailedScheduling   pod/example-deploy-f4b9bc45b-m92cj   0/2 nodes are available: 2 Insufficient cpu.
4m37s       Normal    TriggeredScaleUp   pod/example-deploy-f4b9bc45b-m92cj   pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/gke-autopilot-test/zones/us-west1-b/instanceGroups/gk3-autopilot-2-nap-lgoujuzl-d153482f-grp 0->2 (max: 1000)} {https://content.googleapis.com/compute/v1/projects/gke-autopilot-test/zones/us-west1-a/instanceGroups/gk3-autopilot-2-nap-lgoujuzl-81610659-grp 0->2 (max: 1000)}]
3m58s       Warning   FailedScheduling   pod/example-deploy-f4b9bc45b-m92cj   0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
3m48s       Warning   FailedScheduling   pod/example-deploy-f4b9bc45b-m92cj   0/6 nodes are available: 2 Insufficient cpu, 4 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
3m37s       Normal    Scheduled          pod/example-deploy-f4b9bc45b-m92cj   Successfully assigned default/example-deploy-f4b9bc45b-m92cj to gk3-autopilot-2-nap-lgoujuzl-d153482f-hvp1
3m35s       Normal    Pulling            pod/example-deploy-f4b9bc45b-m92cj   Pulling image "ubuntu"
3m33s       Normal    Pulled             pod/example-deploy-f4b9bc45b-m92cj   Successfully pulled image "ubuntu"
3m33s       Normal    Created            pod/example-deploy-f4b9bc45b-m92cj   Created container ubuntu
3m33s       Normal    Started            pod/example-deploy-f4b9bc45b-m92cj   Started container ubuntu

Event log of a Pod that is successfully deployed

You can also view all events with kubectl get -w events, but it can be a bit noisy. The other place to view a Pod’s events is using kubectl describe pod POD_NAME, although this doesn’t offer watch/streaming functionality so you won’t see new events without running it again.

If you see FailedScheduling without a TriggeredScaleUp, the Pod may never be provisioned due to another unmet condition unrelated to the cluster’s compute capacity. Such conditions are generally visible in the Pod’s event log as well. Here’s an example of such a case, where a PVC claim cannot be met:

# kubectl get event -w --field-selector involvedObject.name=pvc-demo  
LAST SEEN   TYPE      REASON             OBJECT         MESSAGE
0s          Warning   FailedScheduling   pod/pvc-demo   persistentvolumeclaim "example-pv-claim" not found
0s          Warning   FailedScheduling   pod/pvc-demo   persistentvolumeclaim "example-pv-claim" not found
0s          Normal    NotTriggerScaleUp   pod/pvc-demo   pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 running "VolumeBinding" filter plugin for pod "pvc-demo": error getting PVC "default/example-pv-claim": could not find v1.PersistentVolumeClaim "default/example-pv-claim"

Event log for a Pod that won’t be scheduled until the PVC is created