Multi-Cluster Services on GKE

3 min read

Connect internal services from multiple clusters together in one logical namespace. Easily connect services running in Autopilot to Standard and vice versa, share services between teams running on their own services, and back an internal service by replicas in multiple clusters for cross-regional availability. All with the Multi-cluster Service support in GKE.

For this demo, let’s create a service in a GKE Autopilot cluster, and access it from a GKE Standard cluster.

Follow the guide to:
a) Enable the API
b) Enable MCS on the fleet (gcloud container fleet multi-cluster-services enable --project $PROJECT_ID)

Create 2 clusters. In my test, I created an Autopilot cluster with the default options, and a Standard cluster with Workload Identity enabled (if you forget to enable Workload Identity or are using an existing cluster, don’t worry you can update it via the UI).

Next, add both clusters to your fleet, like so:

gcloud container fleet memberships register $CLUSTER_NAME \
   --gke-cluster $CLUSTER_LOCATION/$CLUSTER_NAME \
   --enable-workload-identity \
   --project $PROJECT_NAME

Pay close attention to the gke-cluster parameter, which is constructed of the location and cluster, for example us-central1/ap-commtest2.

Verify that things are setup correctly by viewing the membership list, and verifying MCS status. In the output below, my clusters were named “commtest” and “ap-commtest2”.

$ gcloud container fleet memberships list
NAME: commtest
EXTERNAL_ID: c4badf16-ae2e-43b5-bd45-de3fec65fb8e
LOCATION: us-central1

NAME: ap-commtest2
EXTERNAL_ID: 2610d5b0-24f9-42ef-874e-9964a06b321e
LOCATION: us-central1

$ gcloud container fleet multi-cluster-services describe \
    --project gke-autopilot-test
createTime: '2023-05-04T18:04:19.645143949Z'
membershipStates:
  projects/213543088169/locations/us-central1/memberships/ap-commtest2:
    state:
      code: OK
      description: Firewall successfully updated
      updateTime: '2023-05-04T18:33:11.208425180Z'
  projects/213543088169/locations/us-central1/memberships/commtest:
    state:
      code: OK
      description: Firewall successfully updated
      updateTime: '2023-05-05T02:09:33.506091723Z'
name: projects/gke-autopilot-test/locations/global/features/multiclusterservicediscovery
resourceState:
  state: ACTIVE
spec: {}
updateTime: '2023-05-05T02:09:33.956153333Z'

Now, let’s take it for a spin. Create a deployment and an internal service on your first cluster. I’ll use this pair:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: timeserver
spec:
  replicas: 3
  selector:
    matchLabels:
      pod: timeserver-pod
  template:
    metadata:
      labels:
        pod: timeserver-pod
    spec:
      containers:
      - name: timeserver-container
        image: docker.io/wdenniss/timeserver:1

deploy.yaml

apiVersion: v1
kind: Service
metadata:
  name: timeserver
spec:
  selector:
    pod: timeserver-pod
  type: ClusterIP
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP

internal-service.yaml

To expose this service to the whole fleet and make it accessible, we create a ServiceExport CRD.

kind: ServiceExport
apiVersion: net.gke.io/v1
metadata:
 namespace: default
 name: timeserver

export.yaml

Here’s a 3-liner to save you the trouble of copying the files:

kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/multi-cluster-services/deploy.yaml
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/multi-cluster-services/internal-service.yaml
kubectl create -f https://raw.githubusercontent.com/WilliamDenniss/autopilot-examples/main/multi-cluster-services/export.yaml

Now from the second cluster, authenticate it to kubectl, and query kubectl get ServiceImport. In about 5 minutes, you should see an entry for the service we exported in the first cluster. Once you see this object, it’s ready to use.

$ kubectl get ServiceImport
NAME           TYPE           IP                AGE
timeserver     ClusterSetIP   ["10.64.1.165"]   6m

To test, let’s create an ubuntu pod and call the service.

$ kubectl run -it ubuntu --image=ubuntu -- bash
root@ubuntu:/# apt-get update && apt-get install dnsutils curl -y
root@ubuntu:/# host timeserver.default.svc.clusterset.local
timeserver.default.svc.clusterset.local has address 10.64.1.165
root@ubuntu:/# curl http://timeserver.default.svc.clusterset.local
The time is 2:29 AM, UTC.
root@ubuntu:/# 

If we query the services in this namespace, we can see an entry that is providing this cluster ip.

$ kubectl get svc
NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
gke-mcs-vhul62o5mq   ClusterIP   10.64.1.165   <none>        80/TCP    37m
kubernetes           ClusterIP   10.64.0.1     <none>        443/TCP   10h

So that’s native multi-cluster networking on GKE. Being able to discover and reference exported services from any cluster in the fleet feels just like the service is local to the cluster.

Next thing to try: if you export the same service (i.e. same namespace & service name) it creates an internally load balanced service spreading traffic across your clusters.