Upgrading a GKE cluster to use Autopilot Compute Class

Autopilot is now available in GKE clusters via Compute Class. See the official announcement, and documentation. Versions with this feature are now available for clusters in the Regular release channel. Here’s how to upgrade early and get these features today. Per the docs, the feature is in version 1.34.1-gke.1829001 or later. To upgrade we need… Continue reading Upgrading a GKE cluster to use Autopilot Compute Class

Stable Diffusion WebUI on GKE Autopilot

I recently set out to run Stable Diffusion on GKE in Autopilot mode, building a container from scratch using the AUTOMATIC1111‘s webui. This is likely not how you’d host a stable diffusion service for production (which would make for a good topic of another blog post), but it’s a fun way to try out the… Continue reading Stable Diffusion WebUI on GKE Autopilot

CUDA 12 on GKE Autopilot

Per the NVIDIA docs, CUDA 12 applications require driver 525.60.04+. This driver is available as part of GKE 1.28. To upgrade an existing cluster to the latest version of 1.28: This upgrades the control plane, and schedules the nodes to follow, which generally completes within a day or two (depending on how many nodes you… Continue reading CUDA 12 on GKE Autopilot

Finding the NVIDIA Driver Version on GKE

Update: this information is now available in the official docs. If you want to know what version of your GPU drivers are active on GKE, here’s a one-liner: What this command does is get all the logs of Pods with the label k8s-app=nvidia-gpu-device-plugin (there are several different DaemonSets that can install the drivers depending on… Continue reading Finding the NVIDIA Driver Version on GKE