GKE Autopilot is pretty magical. You create a cluster just by picking a region and giving it a name, schedule Kubernetes workloads and the compute resources are provisioned automatically.
While Kubernetes is provisioning resources, your Pods will be in the Pending
state. This is all well and good, except… there are other reasons that your Pods can be pending.
NAME READY STATUS RESTARTS AGE example-deploy-84576c8598-7jbqc 1/1 Pending 0 34s example-deploy-84576c8598-7mbfh 1/1 Pending 0 33s example-deploy-84576c8598-bhfqt 1/1 Pending 0 34s
Pods that are Pending
. Will they be provisioned, or won’t they?
There are plenty of reasons a Pod can be in the Pending
status. Kubernetes is a declarative system and allows you to declare a Pod even if it can’t actually be provisioned. One example is if the Pod requires a PersistentVolumeClaim
, and no such claim exists. The Pod will sit there in pending waiting for the PersistentVolumeClaim
to be created. So how can you tell the difference?
Fortunately there is an event you can look for to disambiguate in Autopilot: TriggeredScaleUp
. Schedule your pods, and stream the events for one of your pods:
kubectl get event -w --field-selector involvedObject.name=POD_NAME
If all goes to plan, after the initial FailedScheduling
event which indicates there were not enough resources, you will see TriggeredScaleUp
which indicates the that Autopilot is creating resources for you. This may be followed by more FailedScheduling
events, but fear not, as the resources are being scheduled. Finally, a Scheduled
event will be posted when the Pod has the resources it needs.
$ kubectl get event -w --field-selector involvedObject.name=example-deploy-f4b9bc45b-m92cj LAST SEEN TYPE REASON OBJECT MESSAGE 4m9s Warning FailedScheduling pod/example-deploy-f4b9bc45b-m92cj 0/2 nodes are available: 2 Insufficient cpu. 4m37s Normal TriggeredScaleUp pod/example-deploy-f4b9bc45b-m92cj pod triggered scale-up: [{https://content.googleapis.com/compute/v1/projects/gke-autopilot-test/zones/us-west1-b/instanceGroups/gk3-autopilot-2-nap-lgoujuzl-d153482f-grp 0->2 (max: 1000)} {https://content.googleapis.com/compute/v1/projects/gke-autopilot-test/zones/us-west1-a/instanceGroups/gk3-autopilot-2-nap-lgoujuzl-81610659-grp 0->2 (max: 1000)}] 3m58s Warning FailedScheduling pod/example-deploy-f4b9bc45b-m92cj 0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 3m48s Warning FailedScheduling pod/example-deploy-f4b9bc45b-m92cj 0/6 nodes are available: 2 Insufficient cpu, 4 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate. 3m37s Normal Scheduled pod/example-deploy-f4b9bc45b-m92cj Successfully assigned default/example-deploy-f4b9bc45b-m92cj to gk3-autopilot-2-nap-lgoujuzl-d153482f-hvp1 3m35s Normal Pulling pod/example-deploy-f4b9bc45b-m92cj Pulling image "ubuntu" 3m33s Normal Pulled pod/example-deploy-f4b9bc45b-m92cj Successfully pulled image "ubuntu" 3m33s Normal Created pod/example-deploy-f4b9bc45b-m92cj Created container ubuntu 3m33s Normal Started pod/example-deploy-f4b9bc45b-m92cj Started container ubuntu
Event log of a Pod that is successfully deployed
You can also view all events with kubectl get -w events
, but it can be a bit noisy. The other place to view a Pod’s events is using kubectl describe pod POD_NAME
, although this doesn’t offer watch/streaming functionality so you won’t see new events without running it again.
If you see FailedScheduling
without a TriggeredScaleUp
, the Pod may never be provisioned due to another unmet condition unrelated to the cluster’s compute capacity. Such conditions are generally visible in the Pod’s event log as well. Here’s an example of such a case, where a PVC claim cannot be met:
$ kubectl get event -w --field-selector involvedObject.name=pvc-demo LAST SEEN TYPE REASON OBJECT MESSAGE 0s Warning FailedScheduling pod/pvc-demo persistentvolumeclaim "example-pv-claim" not found 0s Warning FailedScheduling pod/pvc-demo persistentvolumeclaim "example-pv-claim" not found 0s Normal NotTriggerScaleUp pod/pvc-demo pod didn't trigger scale-up (it wouldn't fit if a new node is added): 4 running "VolumeBinding" filter plugin for pod "pvc-demo": error getting PVC "default/example-pv-claim": could not find v1.PersistentVolumeClaim "default/example-pv-claim"
Event log for a Pod that won’t be scheduled until the PVC is created
This article is bonus material to supplement my book Kubernetes for Developers.