GKE – William Denniss

Running DeepSeek open reasoning models on GKE

DeepSeek’s R1 open model launch caused quite a stir with one of the first open reasoning models. Here’s how to run a demo of it locally on GKE! We can use an Nvidia L4 (or A100 40GB) to run the 8B Llama distilled model, or a A100 80GB to run the 14 and 32B Quen… Continue reading Running DeepSeek open reasoning models on GKE

Provisioning spare capacity in GKE Autopilot with placeholder balloon pods

Autopilot is a new mode of operation for Google Kubernetes Engine (GKE) where compute capacity is dynamically provisioned based on your pod’s requirements. Among other innovations, it essentially functions as a fully automatic cluster autoscaler. Update: GKE now has an official guide for provisioning spare capacity. When you deploy a new pod in this environment,… Continue reading Provisioning spare capacity in GKE Autopilot with placeholder balloon pods