TensorFlow on GKE Autopilot with GPU acceleration

Last week, GKE announced GPU support for Autopilot. Here’s a fun way to try it out: a TensorFlow-enabled Jupyter Notebook with GPU-acceleration! We can even add state, so you can save your work between sessions. Autopilot makes all this really, really easy, as you can configure everything as a Kubernetes object. Setup First, create a… Continue reading TensorFlow on GKE Autopilot with GPU acceleration

Provisioning Capacity in GKE Autopilot

I previously documented how to add spare capacity to an Autopilot Kubernetes cluster, whereby you create a balloon Deployment to provision some scheduling headroom. This works to constantly give you a certain amount of headroom, so for example if you have a 2vCPU balloon Deployment, and use that capacity it will get rescheduled. A useful… Continue reading Provisioning Capacity in GKE Autopilot

High-Performance Compute on Autopilot

This week, Autopilot announced support for the Scale-Out Compute Class, for both x86 and Arm architectures. The point of this compute class is to give you cores for better single-threaded performance, and improved price/performance for “scale-out” workloads — basically for when you are saturating the CPU, and/or need faster single-threaded performance (e.g. remote compilation, etc).… Continue reading High-Performance Compute on Autopilot

Arm on Autopilot

Arm was made available in Preview on Google Cloud, and GKE Autopilot today! As this is an early stage Preview, there’s a few details to pay attention to if you want to try it out, like the version, regions and quota. I put together this quickstart for trying out Arm in Autopilot today. Arm nodes… Continue reading Arm on Autopilot

Minimizing Pod Disruption on Autopilot

There are 3 common reasons why a Pod may be terminated on Autopilot: node upgrades, a cluster scale-down, and a node repair. PDBs and graceful termination periods modify the disruption to pods when these events happen, and maintenance windows and exclusions control when upgrade events can occur. Upgrade gracefulTerminationPeriod: limited to one hourPDB: is respected… Continue reading Minimizing Pod Disruption on Autopilot

Building GKE Autopilot

Last month gave a presentation at KubeCon Europe in Valencia on “Building a Nodeless Kubernetes Platform”. In it, I shared the details about the creation of GKE Autopilot including some key decisions that we made, how the product was implemented, and why I believe that the design leads to an ideal fully managed platform. Autopilot… Continue reading Building GKE Autopilot

Preferring Spot in GKE Autopilot

Spot Pods are a great way to save money on Autopilot, currently 70% off the regular price. The catch is two-fold: Your workload can be disrupted There may not always be spot capacity available For workload disruption, this is simply a judgement call. You should only run workloads that can accept disruption (abrupt termination). If… Continue reading Preferring Spot in GKE Autopilot

Categorized as Autopilot

Separating Workloads in Autopilot

Autopilot while being operationally nodeless, still creates nodes for your workloads behind the scenes. Sometimes it may be desirable as an operator to separate your workloads so that certain workloads are scheduled together. One example I heard recently was a cluster that primarily processes large batch jobs. In addition to these spikey workloads that cause… Continue reading Separating Workloads in Autopilot

Categorized as Autopilot

Kubernetes Nodes and Autopilot

One of the key design decisions of GKE Autopilot is the fact that we kept the same semantic meaning of the Kubernetes node object. It’s “nodeless” in the sense that you don’t need to care about, or plan for nodes—they are provisioned and managed automatically based on your PodSpec. However, the node object still exists… Continue reading Kubernetes Nodes and Autopilot

Categorized as Autopilot

Creating an Autopilot cluster at a specific version

Sometimes you may wish to create or update a GKE Autopilot cluster with a specific version. For example, the big news this week is that mutating webhooks are supported in Autopilot (from version 1.21.3-gke.900). Rather than waiting for your desired version to be the default in your cluster’s release channel, you can update ahead of… Continue reading Creating an Autopilot cluster at a specific version

Categorized as Autopilot