From Laptop to Hybrid Cloud: Building a Modern and Frugal Kubernetes Network with Cilium…

36 min read Original article ↗

Jeff Cheng

(Or: How I Learned to Stop Worrying and Love Network Overlays with a Poor Budget)

Press enter or click to view image in full size

The Setup Story

Ever had that moment when your ambitious tech experiments start making your laptop question its life choices? Well, buckle up, because that’s exactly where this story begins. After our adventures in Blog 1 (where we cobbled together a basic Kubernetes cluster) and Blog 2 (where we made it actually useful with Cilium’s networking magic), I got a bit… let’s say “inspired.”

The plan was simple: I wanted to play with Cilium’s ClusterMesh using more single-node clusters. My laptop, however, had other ideas. At the mere suggestion of running additional VMs, it started making sounds that reminded me of a small aircraft preparing for takeoff. So, like any reasonable engineer with questionable judgment, I decided to solve this problem by making it more complicated — by building a hybrid cloud setup instead!

Is it practical? Roughly as sensible as building a laser cannon to pop balloons.

Is it fun? Absolutely, yes!

We’re going to create a “home-to-cloud” bridge using Wireguard, deploy K3s clusters, and connect them all with Cilium’s ClusterMesh. Think of it as building a bridge between your laptop and the cloud, except this bridge is made of encrypted packets and developer dreams (and maybe a few tears).

What We’re Building

Press enter or click to view image in full size

A mini hybrid cloud with a Wireguard bridge

Our architecture — and please note, the word “overengineered” was practically invented for this (for my poor 2022 non-gaming Laptop) — features:

A battled-tested laptop (which thinks it only need to handle MS Office) running:

  • Internet access (because we’re not cavemen)
  • Hyper-V v-switch (192.168.100.1) piping outbound traffic
  • Our survivor from Blog 1 & 2: Cluster 1 (vanilla K8s v1.31.2, single-node)
  • Cluster 2 VM (K3s, because variety is the spice of life, single-node)

An Azure-based cluster:

  • Cluster 3 VM (K3s living its best life in Azure, single-node)

Dispersed networks:

  • Three distinct network segments (because we like to keep things organized):
  • Laptop VNet: 192.168.100.0/24
  • Azure WireGuard subnet: 192.168.201.0/24
  • Azure K3s subnet: 192.168.200.0/24

The bridging solution:

  • A Wireguard VPN tunnel (because Azure’s VPN Gateway costs more than my coffee budget)
  • Client: Running on our laptop (192.168.100.50)
  • Server: Living in Azure with a public IP (192.168.201.4)

Why this setup?” you might ask. Well, while Azure offers its own VPN Gateway service, it’s a bit like using a sledgehammer to crack a nut — powerful but costly (starting at >$100/month in West US 2 region!).

Press enter or click to view image in full size

Hmm… Basic seems ok

Press enter or click to view image in full size

Wait! Where is my Basic??

For our homelab needs, Wireguard is the perfect alternative — it’s lightweight, blazingly fast, and won’t make your wallet cry. Plus, running Wireguard directly on our VMs gives us complete control over the networking stack.

The separate subnets in Azure (one for Wireguard, one for K3s) help keep things organized and secure (because even in a homelab, we pretend to follow best practices… ). As for having two clusters on the laptop… well, that’s just because we can! Sure, we could have put both K3s clusters in Azure, but then we wouldn’t get to experience the joy of watching our laptop’s fans spin up to drone-takeoff speeds.

Remember, the goal here isn’t just to build a clustermesh — it’s to build one that spans from our desk to the cloud, all while keeping our Azure bill lower than our coffee budget (2025 resolution: drink less coffee).

Prerequisites

(Or: What You Need Before This Gets Crazy)

Before we dive into this adventure, make sure you have:

🏗️ Infrastructure Requirements:

  • A working Kubernetes cluster from Blog 1 & 2 (or something similar that you cobbled together)
  • At least 8GB RAM for the main cluster (16GB if you want to sleep at night)
  • Cilium 1.16.3 or newer & Cilium CLI installed (we’ll use 1.16.5 because living on the edge is fun)
  • ArgoCD and Argo CD CLI installed (because manually applying YAML files is so 2023)

☁️ Cloud Requirements:

  • An Azure subscription (or similar cloud provider if you’re feeling rebellious)
  • Rights to create VNets, VMs, and other resources (I assume you have)

💻 VM Resources:

  • Enough RAM to make your laptop question its life choices
  • For Wireguard VMs:
  • 2 vCPU and 4 GB RAM for the NVA (Network Virtual Appliance) on your laptop
  • 2 vCPU and 8 GB RAM for the NVA happily in Azure (D2s v3.. I am lazy to choose)
  • For K3s:
  • 2 vCPU and 8 GB RAM for Cluster 2 VM on your laptop
  • 2 vCPU and 8 GB RAM for Cluster 3 VM happily in Azure (D2s v3.. I am really lazy to choose)

🧠 Knowledge Requirements:

  • Basic networking knowledge (or willingness to learn from mistakes)
  • A sense of adventure and a high tolerance for YAML
  • Git repository for our ArgoCD setup (because you might forget what you applied… we all do)
  • Patience (lots of it)

Pro Tip: If your laptop starts making helicopter noises, that’s normal (use some WD-40). If it starts hovering, you might want to scale down your clusters.

The Master Plan Outlines (With Links)

Or: How We’ll Navigate This Adventure

(To make navigation easier through this rather long adventure, I’ve added internal links throughout the blog. Click on any task outline to teleport directly there.)

Task 1: Prepare the Playgrounds

  • Plan IP segments for all networks
  • Set up Azure infrastructure (VNets, subnets, route tables)
  • Deploy Network Virtual Appliances (NVAs)

Task 2: Build the Bridge

  • Configure Wireguard on both NVAs
  • Set up routing between networks
  • Test connectivity
  • Troubleshoot when things go wrong (because they will)

Task 3: Deploy the K3s Army

  • Install K3s clusters with specific parameters
  • Manage multiple Kubernetes contexts
  • Share certificates and secrets securely

Task 4: Enter Cilium (Again!)

  • Deploy Cilium via Argo CD
  • Configure networking features (L2 announcements, IP pools)
  • Prepare ClusterMesh components

Task 5: Unite the Clusters

  • Configure cross-cluster authentication
  • Connect clusters with ClusterMesh
  • Validate mesh connectivity

Task 6: The Grand Finale

  • Break down Bookinfo across clusters
  • Deploy distributed services
  • Watch the magic in Hubble

⚠️ Warning: This setup is probably (actually, definitely) overkill for most use cases on a single poor laptop. But then again, if we only did what was necessary, we’d still be running everything on Windows 10 (sorry, some SQL servers still very much do!). Where’s the fun in that?

Alright! Grab your coffee, make sure your laptop’s cooling pad is ready, and let’s dive into Task 1!

Task 1: Prepare the Playgrounds

(Or: The Art of Assigning IP Blocks Without Summoning Network Demons)

Task 1: Prepare the Playgrounds ← We are here

Task 2: Build the Bridge

Task 3: Deploy the K3s Army

Task 4: Enter Cilium (Again!)

Task 5: Unite the Clusters

Task 6: The Grand Finale

Embarking on Task 1: Juggling on-prem servers, Azure resources, and CIDRs? We’ll set up subnets, route tables, and prepare for our multi-cluster adventure. Remember, messing up IP allocation is a common hiccup, but fixing it is a hassle. Let’s get it right the first time.

Understanding the Why

Before we dive into IP assignments (thrilling, I know), let’s understand why we’re being so particular about our network design:

  1. Non-Overlapping CIDRs: Because debugging IP collisions is worse than debugging your taxes.
  2. Minimal Internet Exposure: The fewer public endpoints you have, the fewer random bots try to break down your digital door.
  3. Future-Proofing for Cilium: Our dear friend ClusterMesh hates IP overlaps more than a cat hates water.

Overlapping IPs might cause cosmic confusion, which you’ll only discover after your pods vanish into the void.

The Grand IP Address Plan

Here’s our carefully crafted IP segment plan (feel free to adjust, or blindly follow if you trust my questionable decisions):

Press enter or click to view image in full size

Forgot to add CIDRs for Pods

Why so many subnets? Because giving each component its own address space keeps the puzzle pieces from colliding. No one’s elbowing each other for space, and if something misbehaves, you can isolate it faster.

Contributing Your Coffee Money to Azure

(Not really kidding about this part…)

Before we create our Azure resources, let’s understand what we’re building. For those new to Azure, think of Resource Groups as containers that hold all your cloud stuff together — VMs, storage, networks, and your hopes and dreams of staying within budget.

Press enter or click to view image in full size

Pro Tip: Check out the official docs at https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal if you want to know more about Resource Groups. Though honestly, for our homelab, just think of it as a folder where Azure puts your expensive toys.

  1. Create a Virtual Network resource with 192.168.200.0/23 or more if you like
  2. Create two subnets, subnet Default with 192.168.200.0/24 and subnet Jumper with 192.168.201.0/24.
  3. Create one VM for Cluster 3 with D2s v3 (2 vcpus, 8 GiB memory) SKU in subnet Default without Public IP, and then configure the NSG
  • You will get the VM with 192.168.200.4 IP automatically assigned.
  • Navigate to the associated NSG resource
  • Add a rule allows any outbound traffic from your Azure network to your laptop’s network (192.168.100.0/23).
  • Add a rule allows any inbound traffic from your laptop’s network (192.168.100.0/24) to your Azure default subnet (192.168.200.0/24).

Press enter or click to view image in full size

Network Security Group

4. Create one VM for Wireguard with D2s v3 (2 vcpus, 8 GiB memory) SKU in subnet Jumper with Public IP, and then configure the NSG

  • You should choose to use SSH connect via public key instead of password authentication
  • You will get the VM with 192.168.201.4 IP and a static Public IP automatically assigned.
  • Navigate to the associated NSG resource
  • Add a rule allows any inbound TCP traffic via port 22 (for SSH) to your Jumper VM’s Public IP
  • Add a rule allows any inbound UDP traffic via port 51820 (for WireGuard’s default port) to your Jumper VM’s Public IP

Press enter or click to view image in full size

NSG for the Jumper

5. Create a Route Table resource that would route 192.168.100.0/23 to the WireGuard server and PodCIDR (192.168.200.120–192.168.200.169) to the Cluster 3 VM; and, associate the route table with the subnets. It should look like below:

Press enter or click to view image in full size

Pro Tip 1: My IP Pool range for Cluster 3 resulted in 3 separate routes. Learn from my mistakes — plan your IP ranges better!

Pro Tip 2: The “Next hop type” should always be “Virtual appliance” if using “Next hop IP address.”

Pro Tip 3: One route table can serve both subnets (we’re being efficient… or lazy)

Pro Tip 4: If Azure’s portal asks, “Are you sure you want to create this resource?” the answer is always “Yes, but I’m not sure about my wallet.” Keep an eye on costs.

Final Steps

Once you’ve got your Azure resources set up. Your Wireguard Server VM and Cluster 3 VM should be able to ping each other.

  1. Install Wireguard on both VMs:
# If you use Ubuntu (I do), run below:
sudo apt update && sudo apt install wireguard -y

2. Verify you can ping between the WireGuard VM and the Cluster 3 VM. Then, from your local environment, see if you can at least handshake with the WireGuard server via its public IP. If any of these fail, time to rummage through route table entries or NSG rules and see if you missed a digit.

Additional Knowledge Drop

“Why not install WireGuard directly on your laptop?” Well, my Windows 11 Pro laptop with Hyper-V virtual switch set to “internal” type prevents external traffic from entering the virtual network. And hacking the virtual switch settings is about as fun as explaining to your boss why you need three Kubernetes clusters for a “simple” demo.

Success Checklist✅

✅ Network Design Completed

✅ Azure Network Infrastructure Created

✅ Virtual Machines Deployed

✅ NSG Rules Configured

✅ Routing Setup Complete

✅ Basic Connectivity Verified

Next up in Task 2: We’ll configure WireGuard and hope our packets find their way home! 🚀

Task 2: Build the Bridge

(Or: Making Wireguard Our Virtual Networking Best Friend)

Task 1: Prepare the Playgrounds

Task 2: Build the Bridge ← We are here

Task 3: Deploy the K3s Army

Task 4: Enter Cilium (Again!)

Task 5: Unite the Clusters

Task 6: The Grand Finale

You’ve meticulously carved out Azure subnets and prepped local VMs, yet there’s still a gaping void between them. The solution? WireGuard

The Why Behind Wireguard

Before we start configuring, let’s understand what makes Wireguard special (besides being cheaper than Azure VPN Gateway). Wireguard is like the minimalist’s VPN — it does one thing and does it well. It uses state-of-the-art cryptography to create secure, peer-to-peer connections, wrapping your traffic in lightweight encrypted tunnels. (Hint: You might also have a Cilium cluster with WireGuard encryption enabled, connecting directly to an external WireGuard server/client. This topic is not explored here.)

Think of it as a very efficient postal service:

  • Uses public/private key pairs for authentication (like having a special mailbox key)
  • Minimal configuration needed (no PhD in networking required)
  • Only supports essential, secure algorithms (no legacy baggage)
  • Blazingly fast (because life’s too short for slow VPNs)

And the true reason we’re using it? Azure VPN Gateway starts at >$100/month. That’s a lot of coffee we could be buying instead! (☕ > 💰)

Setting Up Our Tunnel

Now that our VMs are ready (you did complete Task 1, right?), let’s create our secure tunnel.

Step 1: Generate Key Pairs

First, we need to generate private/public key pairs for both the Server and Client. Think of these as unique ID cards for our VMs.

# Run below in both VM to get respective pair:
wg genkey | tee privatekey | wg pubkey > publickey
cat ./publickey
cat ./privatekey

Pro Tip: You should store these keys securely (or just leave them there if you’re feeling rebellious… I won’t judge, this is a homelab after all).

Step 2: Configure the Wireguard Server (Azure VM)

# Create the configuration and pipe it to the WireGuard config file with sudo
sudo cat << 'EOF' > /etc/wireguard/wg0.conf
[Interface]
Address = 192.168.201.4/32
PrivateKey = <this machine's Private Key>
ListenPort = 51820

# Enable packet forwarding and NAT for VPN traffic
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; \
iptables -A FORWARD -o wg0 -j ACCEPT; \
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

# Clean up rules when interface goes down
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; \
iptables -D FORWARD -o wg0 -j ACCEPT; \
iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE

[Peer]
PublicKey = <remote machines' Public Key>
AllowedIPs = 192.168.100.0/24
EOF

# Set restrictive permissions on the config file since it contains private key
sudo chmod 600 /etc/wireguard/wg0.conf

# Enable and start the WireGuard interface
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# Verify interface status
sudo wg show

A few notes:

  • We’re specifying a /32 because this interface gets exactly one IP from the Azure subnet.
  • NAT masquerading ensures traffic from your client can exit the VM onto Azure’s network without losing its mind.
  • Don’t forget to open port 51820/UDP in your Network Security Group and in Azure’s firewall settings, or you’ll be trying to poke a brick wall.

Step 3: Configure the Wireguard Client (Laptop VM)

Now pivot to your local VM, presumably at 192.168.100.50 (if you ever checked the table in Task 1!):

# Create the configuration and pipe it to the WireGuard config file with sudo
sudo cat << 'EOF' > /etc/wireguard/wg0.conf
[Interface]
PrivateKey = <this machine's Private Key>
Address = 192.168.100.50/32

[Peer]
PublicKey = <remote machines' Public Key>
AllowedIPs = 192.168.200.0/23
Endpoint = <remote machines' Public IP>:51820
PersistentKeepalive = 25
EOF

# Set restrictive permissions on the config file since it contains private key
sudo chmod 600 /etc/wireguard/wg0.conf

# Enable and start the WireGuard interface
sudo systemctl enable wg-quick@wg0
sudo systemctl start wg-quick@wg0

# Verify interface status
sudo wg show

If all aligns, you should see a handshake in about two seconds, forging your ephemeral link between local and cloud. And you should be able to see the output like:

Press enter or click to view image in full size

Keep them securely

Testing the Connection

Time to see if our tunnel actually works! Try tracing the route from Cluster 3 VM to Cluster 1 VM:

Press enter or click to view image in full size

See how they hop!

If you see the packets hopping through 192.168.201.4 → 192.168.100.50 → 192.168.100.100, congratulations! Your tunnel is working as designed. This is a major milestone — we’ve successfully created our hybrid cloud connectivity!

The Final Touch: Route Configuration

But wait, we’re not done yet! Even though we’ve built routes from 192.168.200.0/23 to 192.168.100.0/23 in Azure’s Route Table, we need to tell your laptop’s VMs how to reach the Azure side.

If you’re running Windows (like me), add these routes:

# One-time route (will be removed after reboot)
route add 192.168.200.0 mask 255.255.255.0 192.168.100.50

# Permanent route (persists after reboot)
route -p add 192.168.200.0 mask 255.255.255.0 192.168.100.50

Verify everything works with another traceroute:

Press enter or click to view image in full size

Success Checklist✅:

  • ✅ Wireguard tunnel established
  • ✅ Packets flowing through our tunnel
  • ✅ Routes configured on both ends
  • ✅ All VMs can reach each other

Pro Tip: If something’s not working, double-check your keys, routes, and NSG rules. And remember, it’s not a real networking project until you’ve spent at least an hour debugging why packets aren’t flowing the way they should.

Congratulations, you just created a secure, cost-effective pipeline between your homelab and the Azure cloud. Now your clusters can chat as though they were in adjacent cubicles — minus the water cooler gossip, of course.

Next up in Task 3: Deploying our K3s army! Time to give these networks something to actually talk about. 🚀

Task 3: Deploy the K3s Army

(Or: How Two Clusters Became Our Mini-Army)

Task 1: Prepare the Playgrounds

Task 2: Build the Bridge

Task 3: Deploy the K3s Army ← We are here

Task 4: Enter Cilium (Again!)

Task 5: Unite the Clusters

Task 6: The Grand Finale

Why K3s? Because “traditional” Kubernetes would blow your laptop’s RAM into orbit, and we’re all about pushing boundaries without spontaneously combusting. Grab your coffee — we’re about to deploy an army (if two or three nodes even qualify as an “army,” but let’s roll with it).

Press enter or click to view image in full size

Here

(One quick caffeine fix later…)

Before we start deploying our K3s clusters (and yes, we’re still calling two clusters an army), let’s understand what makes a successful ClusterMesh deployment. According to Cilium’s documentation, there are some critical prerequisites:

  1. Uniform Datapath Mode
  • All clusters must speak the same Cilium dialect
  • Like making sure all your microservices use the same protocol

2. Non-overlapping Pod CIDRs

  • Each cluster needs its own special corner of the network universe
  • Think of it as giving each cluster its own playground

3. Full Node Connectivity

  • All nodes must be able to chat via their InternalIPs
  • Thanks to our Wireguard tunnel, we’ve got this covered!

Pod CIDR Planning

We’re setting up our clusters with these CIDR ranges:

  • Cluster 1 (our OG cluster): 10.244.0.0/16
  • Cluster 2 (K3s on laptop): 10.245.0.0/16
  • Cluster 3 (K3s in Azure): 10.246.0.0/16

Pro Tip: If your Cluster 1’s pods are getting IPs outside 10.244.0.0/16, you might need this quick fix:

# On your Cluster 1 to manaully update the cn CRD, k8s-sn is my cn below
kubectl patch ciliumnode k8s-sn --type='merge' -p '{"spec":{"ipam":{"podCIDRs":["10.244.0.0/24"]}}}'

# Restart Cilium as needed....seriously you need to!!

kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium
kubectl -n kube-system rollout restart deployment/hubble-relay
kubectl -n kube-system rollout restart deployment/hubble-ui

Deploying Our K3s Clusters

Time to deploy our K3s clusters with some carefully chosen flags. Remember: proper installation flags now save hours of debugging later!

# ⚠️ IMPORTANT: Remove all newlines/returns before running these commands! 
# The final command should be one long line for each cluster.

##########################################################################
# On Cluster 2 VM (on-prem 192.168.100.101):
##########################################################################
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='
# Allow kubeconfig to be readable by non-root users
--write-kubeconfig-mode 644

# Disable flannel as we'll use Cilium for networking
--flannel-backend=none

# Disable default k3s components since we'll use Cilium/Argo
--disable=traefik # We'll use Cilium for ingress
--disable=servicelb # We'll use Cilium for load balancing
--disable-kube-proxy # Cilium will replace kube-proxy
--disable-network-policy # Cilium will handle network policies

# Define unique CIDR ranges to avoid conflicts
--cluster-cidr 10.245.0.0/16 # Pod network CIDR
--service-cidr=10.42.0.0/16 # Default Service network CIDR used by K3s

# Bind API server to the VM's IP
--bind-address 192.168.100.101

# Set default storage location
--default-local-storage-path /opt/k3-local-storage
' sh -

##########################################################################
# On Cluster 3 VM (Azure 192.168.200.4):
##########################################################################
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='
# Same settings as Cluster 2, except:
--write-kubeconfig-mode 644
--flannel-backend=none
--disable=traefik
--disable=servicelb
--disable-kube-proxy
--disable-network-policy

# Different Pod CIDR to avoid overlap with Cluster 2
--cluster-cidr 10.246.0.0/16 # Note: 10.246 instead of 10.245
--service-cidr=10.42.0.0/16

# Bind to Azure VM's private IP
--bind-address 192.168.200.4

# Same storage path for consistency
--default-local-storage-path /opt/k3-local-storage
' sh -

# ⚠️ IMPORTANT: Remove all newlines/returns before running these commands!
# The final command should be one long line for each cluster.

Managing Our Multi-Cluster Configuration

Now to make all clusters play nice together. We need to set up proper kubeconfig contexts so Cluster 1 can orchestrate everything.

# ⚠️ IMPORTANT: These commands need to be run on different machines!

# On Cluster 2: Export kubeconfig
sudo cat /etc/rancher/k3s/k3s.yaml > ~/.kube/k3s-local.yaml

# On Cluster 3: Export kubeconfig
sudo cat /etc/rancher/k3s/k3s.yaml > ~/.kube/k3s-azure.yaml

# On Cluster 1: Gather the configs
scp <your_username>@192.168.100.101:~/.kube/k3s-local.yaml ~/.kube/
scp <your_username>@192.168.200.4:~/.kube/k3s-azure.yaml ~/.kube/

# Merge them into one magnificent config
KUBECONFIG=~/.kube/config:~/.kube/k3s-local.yaml:~/.kube/k3s-azure.yaml kubectl config view --flatten > ~/.kube/merged-config
mv ~/.kube/merged-config ~/.kube/config

# Verify your contexts
kubectl config get-contexts

At this point, you should have three clusters: the original vanilla cluster (Cluster 1), plus these two K3s companions (Cluster 2 on your laptop, Cluster 3 on Azure). They stand ready to engage in a sophisticated Cilium-based networking scheme

Success Checklist✅:

  • ✅ All clusters deployed with correct CIDRs
  • ✅ No default CNI or kube-proxy to interfere with Cilium
  • ✅ Kubeconfigs properly merged
  • ✅ Clusters ready for Cilium installation

Pro Tip: If something’s not working, remember that K3s is just Kubernetes on a diet. All the usual Kubernetes debugging techniques apply — just with fewer calories.

Next up in Task 4: Time to introduce Cilium to our newly created clusters! But first, maybe grab another coffee… you’re going to need it. ☕

Task 4: Enter Cilium (Again!)

(Or: How to Make Three Clusters Talk Without Starting a War)

Task 1: Prepare the Playgrounds

Task 2: Build the Bridge

Task 3: Deploy the K3s Army

Task 4: Enter Cilium (Again!) ← We are here

Task 5: Unite the Clusters

Task 6: The Grand Finale

Congratulations — you’ve wrangled WireGuard tunnels, subdivided networks into cozy subnets, and deployed enough K3s clusters to make your laptop shriek like a caffeinated banshee. Now it’s time to unify all this madness into coherency.

The Why Behind Our Approach

Before we dive into the 700+ parameters of Cilium’s Helm chart (yes, you read that right), let’s understand why we’re using GitOps here. It’s not just because we want to look like pros in 2024 — it’s about maintaining our sanity. When you’re juggling three clusters with complex networking configurations, you want:

  1. Version Control: So you can revert to a prior sanity level whenever you break something (inevitable).
  2. Declarative Deployments: Tweak a file in Git, let Argo do the heavy lifting.
  3. Audit Trail: Know exactly who changed what. “It was me, I broke it,” is no longer guesswork.

Pro Tip: But hey, if you prefer an old-fashioned helm install cilium . approach, no one’s stopping you. At least save your configurations somewhere. Future you will thank past you.

Step 1: Upgrading Cluster 1

Let’s start by making sure our main cluster is ready for the mesh. We’ll enable ClusterMesh capabilities and set up Hubble with TLS (because security matters, even in a homelab… maybe not! but this creates “cilium-ca” for us):

# ⚠️ IMPORTANT: 
# 1. Remove all newlines before running (should be one long line)
# 2. This applies to your main cluster (cluster 1)
# Install/Upgrade Cilium with ClusterMesh and Load Balancing capabilities
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--version 1.16.5 \

# Basic cluster settings
--set operator.replicas=1 \
--set cluster.name=<Your_Cluster1_name> \
--set cluster.id=1 \
--set kubeProxyReplacement=true \

# ClusterMesh configuration
--set clustermesh.useAPIServer=true \
--set clustermesh.apiserver.service.type=LoadBalancer \
--set clustermesh.enableEndpointSliceSynchronization=true \

# Networking configuration
--set k8sServiceHost="192.168.100.100" \
--set k8sServicePort=6443 \
--set l2announcements.enabled=true \
--set devices=eth+ \
--set externalIPs.enabled=true \
--set externalTrafficPolicy=Cluster \
--set internalTrafficPolicy=Cluster \

# IPAM configuration
--set ipam.mode=cluster-pool \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.244.0.0/16" \

# Ingress settings
--set ingressController.enabled=true \
--set ingressController.loadbalancerMode=dedicated \

# Hubble observability
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.tls.enabled=true

Step 2: Configure LoadBalancer IPs

Make sure our clusters can actually expose services since we’re bypassing Kube Proxy in favor of Cilium’s own load balancer:

kubectl apply -f - <<EOF
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "general-pool"
spec:
blocks:
- start: "192.168.100.120"
stop: "192.168.100.169"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
interfaces:
- "^eth[0-9]+"
externalIPs: true
loadBalancerIPs: true
EOF

Step 3: Certificate Management

Here’s where it gets interesting. We enabled hubble.tls.enabled=true for multiple reason. This creates a secret called cilium-ca that we'll use across all clusters.

# On Cluster 1, fetch the cert/key
kubectl get secret cilium-ca -n kube-system -o yaml > cilium-ca.yaml

# Make the cert/key persistent (because trust issues)
helm upgrade --install cilium cilium/cilium \
--namespace kube-system \
--version 1.16.5 \
--set tls.ca.cert="ca.crt value in the cilium-ca.yaml" \
--set tls.ca.key="ca.key value in the cilium-ca.yaml"

# Make things into effect proactively
kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium
kubectl -n kube-system rollout restart deployment/hubble-relay
kubectl -n kube-system rollout restart deployment/hubble-ui

Pro Tip: If you skip this, you might find yourself screaming “Why can’t cluster2 talk to cluster1’s Hubble & Clustermesh API servers?” Because TLS doesn’t trust them — that’s why.

Step 4: Rinse and Repeat for Cluster 2 and 3

Now comes the fun part — setting up Cilium on our K3s clusters with Argo CD. We’ll create separate value files for each cluster. You can do it manually via Helm, Cilium CLI, or set up Argo CD apps that point to your helm charts/values:

Cluster 2 Values:

########################################################
# Cilium values for Cluster 2 (k3s-local)
# https://raw.githubusercontent.com/jsccjj/blogs/refs/heads/main/Blog3/configs/cluster2/cilium-helm-values/cluster2-cilium-values.yaml
########################################################

operator:
replicas: 1
cluster:
name: k3s-local # Change it with the name you have given to your Cluster 2, matching the name in the cluster-info
id: 2
kubeProxyReplacement: true
clustermesh:
useAPIServer: true
apiserver:
service:
type: LoadBalancer
k8sServiceHost: "192.168.100.101" # Change it with the IP of your Cluster 2 node
k8sServicePort: 6443
l2announcements:
enabled: true
devices: eth+
externalIPs:
enabled: true
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4PodCIDRList:
- "10.245.0.0/16"
ingressController:
enabled: true
loadbalancerMode: dedicated
hubble:
relay:
enabled: true
ui:
enabled: true
tls:
enabled: true

If use Helm on Cluster 1 to deploy:

# Deploy Cilium on Cluster 2 with the shared CA
helm upgrade --install cilium cilium/cilium \
--kube-context "<Cluster 2 context>" \
--namespace kube-system \
--version 1.16.5 \
--values "<the value yaml for Cluster2>.yaml" \
--set tls.ca.cert="ca.crt value in the cilium-ca.yaml" \
--set tls.ca.key="ca.key value in the cilium-ca.yaml"

# Make sure you have LoadBalancer IPs to use...
# Can also be saved as yaml in our repo
kubectl apply --context "<Cluster 2 context>" -f - <<EOF
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "general-pool"
spec:
blocks:
- start: "192.168.100.170"
stop: "192.168.100.219"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
interfaces:
- "^eth[0-9]+"
externalIPs: true
loadBalancerIPs: true
EOF

Cluster 3 Values:

########################################################
# Cilium values for Cluster 3 (k3s-azure)
# https://raw.githubusercontent.com/jsccjj/blogs/refs/heads/main/Blog3/configs/cluster3/cilium-helm-values/cluster3-cilium-values.yaml
########################################################

operator:
replicas: 1
cluster:
name: k3s-azure # Change it with the name you have given to your Cluster 3, matching the name in the cluster-info
id: 3
kubeProxyReplacement: true
clustermesh:
useAPIServer: true
apiserver:
service:
type: LoadBalancer
k8sServiceHost: "192.168.200.4" # Change it with the IP of your Cluster 3 node
k8sServicePort: 6443
l2announcements:
enabled: true
devices: eth+
externalIPs:
enabled: true
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4PodCIDRList:
- "10.246.0.0/16"
ingressController:
enabled: true
loadbalancerMode: dedicated
hubble:
relay:
enabled: true
ui:
enabled: true
tls:
enabled: true

If use Helm on Cluster 1 to deploy:

# Deploy Cilium on Cluster 3 with the shared CA
helm upgrade --install cilium cilium/cilium \
--kube-context "<Cluster 3 context>" \
--namespace kube-system \
--version 1.16.5 \
--values "<the value yaml for Cluster3>.yaml" \
--set tls.ca.cert="ca.crt value in the cilium-ca.yaml" \
--set tls.ca.key="ca.key value in the cilium-ca.yaml"

# Make sure you have LoadBalancer IPs to use...
# Can also be saved as yaml in our repo
kubectl apply --context "<Cluster 3 context>" -f - <<EOF
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "general-pool"
spec:
blocks:
- start: "192.168.200.120"
stop: "192.168.200.169"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: policy1
spec:
interfaces:
- "^eth[0-9]+"
externalIPs: true
loadBalancerIPs: true
EOF

With everything ready, we also need to ensure that Cluster 1 has both the Cilium CLI and Argo CD CLI installed. They should already be in place, but I bet you didn’t check the prerequisites… lol:

Check here for Cilium CLI installation: https://docs.cilium.io/en/stable/gettingstarted/k8s-install-default/#install-the-cilium-cli

Check here for Argo CD CLI installation: https://kostis-argo-cd.readthedocs.io/en/refresh-docs/getting_started/install_cli/#install-on-linux

The GitOps Way

If you’re working on a simple homelab project like this one, you might not need Argo CD — it’s your choice! Feel free to use the YAML files with Helm and/or the Cilium CLI. Alternatively, save them to your Git repository for later use with Argo CD.

If you deployed with Helm, that’s great! We’ve set up all three clusters with the Cilium CNI and shared CA properly configured. Each cluster can assign IPs to pods and communicate with the others. We’re ready to build our ClusterMesh.

However, as professionals committed to best practices, we should use GitOps. Now that our configurations are ready, let’s set up Argo CD to manage everything.

Press enter or click to view image in full size

With Argo CD installed and Cilium CLI ready (you did check the prerequisites, right? …right?), let’s set up our clusters:

# On Cluster 1
# After log in in Argo CD CLI, add your clusters
argocd cluster add k3s-sn-context # Change it with your Cluster 2 context
argocd cluster add k3s-az-context # Change it with your Cluster 3 context

# Verify they are added
argocd cluster list

# Add Cilium Helm to Argo CD
argocd repo add https://helm.cilium.io

# Add your repo where you have the helm value yaml files
# and the L2 announcement & IP Pool yaml files
argocd repo add https://github.com/yourusername/yourrepo.git

Below is an example of deploying the Argo CD app Cilium on Cluster 3. You can repeat this process for Cluster 2 with modified values. Ensure you fetch the cert/key values from cilium-ca for use:

# Change --helm-value-files with you value yaml file
# Repeat for Cluster 2, with values properly modified
# Write the manifest to a YAML file
cat <<EOF > cluster3-cilium.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: cluster3-cilium
spec:
destination:
namespace: kube-system
server: https://192.168.200.4:6443
source:
repoURL: https://helm.cilium.io
targetRevision: 1.16.5
chart: cilium
helm:
valueFiles:
- https://raw.githubusercontent.com/jsccjj/blogs/refs/heads/main/Blog3/configs/cluster3/cilium-helm-values/cluster3-cilium-values.yaml
parameters:
- name: tls.ca.cert
value: "ca.crt value in the cilium-ca.yaml"
- name: tls.ca.key
value: "ca.key value in the cilium-ca.yaml"
project: default
syncPolicy:
automated: null
syncOptions:
- RespectIgnoreDifferences=true
- ServerSideApply=true
ignoreDifferences:
- kind: Secret
name: cilium-ca
namespace: kube-system
jsonPointers:
- /data
- kind: Service
name: cilium-ingress
namespace: kube-system
jsonPointers:
- /spec/clusterIP
- /spec/clusterIPs
- /spec/ports
- /status
- kind: ConfigMap
name: hubble-ca-cert
jsonPointers:
- /data/ca.crt
- kind: Secret
name: hubble-relay-client-certs
jsonPointers:
- /data/ca.crt
- /data/tls.crt
- /data/tls.key
- kind: Secret
name: hubble-server-certs
jsonPointers:
- /data/ca.crt
- /data/tls.crt
- /data/tls.key
EOF

# Deploy the application using the YAML file
argocd app create --file cluster3-cilium.yaml

# The L2 announcements and IP pool yaml file, used in previous sections
# stored in my Git repo. Try to use yours
argocd app create cluster3-l2ips \
--repo https://github.com/jsccjj/blogs.git \
--path Blog3/configs/cluster3/l2ippool \
--dest-server https://192.168.200.4:6443

# Sync the apps
argocd app sync cluster3-cilium cluster3-l2ips

Pro Tips and Gotchas 🎯

  1. Certificate Management
  • Keep CA certificates secure
  • Remember to propagate cert updates
  • When in doubt, check TLS config first

2. Argo CD Configuration

  • Use RespectIgnoreDifferences=true for dynamic fields
  • Set sync policy to manual for better control
  • Use ServerSideApply=true for large manifests

3. Resource Management

  • Monitor LoadBalancer IP pools
  • Restart Cilium pods after major changes
  • Keep configurations in version control

Success Checklist✅:

  • ✅ Cluster 1 upgraded with ClusterMesh capabilities
  • ✅ All Hubble components running with TLS enabled
  • ✅ LoadBalancer IP pools configured for all clusters
  • ✅ Cilium CA certificates generated and secured
  • ✅ Helm values prepared for all clusters
  • ✅ Argo CD connected to all clusters
  • ✅ Cilium deployed on Clusters 2 and 3

Next up in Task 5: Time to connect these clusters and watch the magic (or chaos) happen! 🎭

Task 5: Unite the Clusters

(Or: The Art of Making Three Clusters into One Happy Family)

Task 1: Prepare the Playgrounds

Task 2: Build the Bridge

Task 3: Deploy the K3s Army

Task 4: Enter Cilium (Again!)

Task 5: Unite the Clusters ← We are here

Task 6: The Grand Finale

If you used Helm or the Cilium CLI for deployment, enable and connect the ClusterMesh by following the official instructions: Setting up Cluster Mesh — Cilium 1.16.5 documentation

The Mission Briefing 🎯

This is the big moment. We’ve got three clusters sprinkled across laptop hypervisors and cloud subnets, each decked out with Cilium. But before we start connecting things wildly (tempting as it may be), let’s understand what we’re trying to achieve.

For a successful ClusterMesh, we need four key elements:

  • ✅ Clusters can reach each other (thank you, Tasks 1 & 2!)
  • ✅ Clustermesh-apiserver deployments are running (courtesy of Task 4)
  • ✅ Apiserver instances can reach each other (also done!)
  • ⚠️ Apiserver instances can authenticate each other (this is our current challenge)

The Certificate Dance 🔐

Remember those Hubble and ClusterMesh API server configurations from Task 4? They’re about to become very important. Each cluster has five critical secrets that Cilium generates:

  • clustermesh-apiserver-server-cert
  • clustermesh-apiserver-admin-cert
  • clustermesh-apiserver-remote-cert
  • clustermesh-apiserver-local-cert
  • clustermesh-apiserver-client-cert (spoiler: this one’s special!)

Check the screeshot below:

Press enter or click to view image in full size

But wait! Where’s clustermesh-apiserver-client-cert? Plot twist: it’s not generated by default. Here’s a little secret (pun intended) — it appears when you enable external workloads:

Press enter or click to view image in full size

We now understand that this is for extending Kubernetes networking capabilities to external resources: Setting up Support for External Workloads (beta) — Cilium 1.16.5 documentation. Think of it as the VIP pass that lets clusters and external workloads talk to each other.

Understanding the Secrets 🗝️

My 30 minutes (actually 65) of research led to this potentially misleading (oversimplified, but who’s counting?) table:

Press enter or click to view image in full size

While “clustermesh-apiserver-remote-cert” seems attractive (and I did get it to work), I want to ensure all clusters have their own “clustermesh-apiserver-client-cert” secret for us to use.

Making the Connection 🌐

Now that we understand our certificates, let’s update our clusters.

Step 1: Enable External Workloads

# On Cluster 1
# Update Cluster 1
helm upgrade --install cilium cilium/cilium \
--reuse-values \
--namespace kube-system \
--version 1.16.5 \
--set externalWorkloads.enabled=true

# Update Cluster 2 & 3
# Method A - update the value yaml files in the git repo with the value
# Method B - use ArgoCD CLI to additionally set values
argocd app set cluster2-cilium --helm-set externalWorkloads.enabled=true
argocd app set cluster3-cilium --helm-set externalWorkloads.enabled=true

Step 2: Collect Certificates and IPs

# Get certificates from each cluster
kubectl get secret clustermesh-apiserver-client-cert -n kube-system -o yaml > cluster1-remote.yaml
kubectl --context "<cluster 2>" get secret clustermesh-apiserver-client-cert -n kube-system -o yaml > cluster2-remote.yaml
kubectl --context "<cluster 3>" get secret clustermesh-apiserver-client-cert -n kube-system -o yaml > cluster3-remote.yaml

# Get LoadBalancer IPs
kubectl get svc clustermesh-apiserver -n kube-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Repeat for other clusters

Step 3: Configure the Mesh

Starting with Cluster 1, to connect with Cluster 2 and 3:

# Be aware that if you have multiple .cluster objects, you must present them all
# Be aware that the clustermesh apiserver is already enabled earlier
# One command to rule them all
helm upgrade --install cilium cilium/cilium \
--reuse-values \
--namespace kube-system \
--version 1.16.5 \
--set clustermesh.config.enabled=true \
--set clustermesh.config.clusters[0].name="cluster 2 name" \
--set clustermesh.config.clusters[0].port=2379 \
--set clustermesh.config.clusters[0].ips[0]="cluster 2 clustermesh-apiserver VIP" \
--set clustermesh.config.clusters[0].tls.cert="cluster 2 tls.crt value" \
--set clustermesh.config.clusters[0].tls.key="cluster 2 tls.key value" \
--set clustermesh.config.clusters[1].name="cluster 3 name" \
--set clustermesh.config.clusters[1].port=2379 \
--set clustermesh.config.clusters[1].ips[0]="cluster 3 clustermesh-apiserver VIP" \
--set clustermesh.config.clusters[1].tls.cert="cluster 3 tls.crt value" \
--set clustermesh.config.clusters[1].tls.key="cluster 3 tls.key value"

Step 4: Also Connect Clusters 2 and 3

Why connect Clusters 2 and 3 directly? Because symmetry is beautiful! While not strictly necessary, it gives us a fully connected mesh.

# Be aware that the clustermesh apiserver is already enabled earlier

# On Cluster 1
# For Cluster 2
argocd app set cluster2-cilium \
--helm-set clustermesh.config.enabled='true' \
--helm-set clustermesh.enableEndpointSliceSynchronization='true' \
--helm-set clustermesh.config.clusters[0].name="cluster 1 name" \
--helm-set clustermesh.config.clusters[0].port=2370 \
--helm-set clustermesh.config.clusters[0].ips[0]="cluster 1 clustermesh-apiserver VIP" \
--helm-set clustermesh.config.clusters[0].tls.cert="cluster 1 tls.crt value" \
--helm-set clustermesh.config.clusters[0].tls.key="cluster 1 tls.key value" \
--helm-set clustermesh.config.clusters[1].name="cluster 3 name" \
--helm-set clustermesh.config.clusters[1].port=2370 \
--helm-set clustermesh.config.clusters[1].ips[0]="cluster 3 clustermesh-apiserver VIP" \
--helm-set clustermesh.config.clusters[1].tls.cert="cluster 3 tls.crt value" \
--helm-set clustermesh.config.clusters[1].tls.key="cluster 3 tls.key value"

# For Cluster 3
argocd app set cluster3-cilium \
--helm-set clustermesh.config.enabled='true' \
--helm-set clustermesh.enableEndpointSliceSynchronization='true' \
--helm-set clustermesh.config.clusters[0].name="cluster 1 name" \
--helm-set clustermesh.config.clusters[0].port=2370 \
--helm-set clustermesh.config.clusters[0].ips[0]="cluster 1 clustermesh-apiserver VIP" \
--helm-set clustermesh.config.clusters[0].tls.cert="cluster 1 tls.crt value" \
--helm-set clustermesh.config.clusters[0].tls.key="cluster 1 tls.key value" \
--helm-set clustermesh.config.clusters[1].name="cluster 2 name" \
--helm-set clustermesh.config.clusters[1].port=2370 \
--helm-set clustermesh.config.clusters[1].ips[0]="cluster 2 clustermesh-apiserver VIP" \
--helm-set clustermesh.config.clusters[1].tls.cert="cluster 2 tls.crt value" \
--helm-set clustermesh.config.clusters[1].tls.key="cluster 2 tls.key value"

The Moment of Truth 🎭

After about a minute or two (or several coffee sips), verify your mesh:

Press enter or click to view image in full size

Cluster 1 → 2 & 3

Press enter or click to view image in full size

Cluster 2 → 3 & 1

Press enter or click to view image in full size

Cluster 3 → 1 & 2

After configuration, you might be tempted to celebrate… but wait! Since we’re using Helm and Argo CD, we need one final step: restart the Cilium components to ensure they pick up the new mesh configuration:

# Make things into effect proactively, run on all clusters
kubectl -n kube-system rollout restart deployment/cilium-operator
kubectl -n kube-system rollout restart ds/cilium
kubectl -n kube-system rollout restart deployment/hubble-relay
kubectl -n kube-system rollout restart deployment/hubble-ui

Now, you can test connectivity within the mesh using the official Cilium tool:

# Use --force-deploy to clean out existing "cilium-test-1" configs
cilium connectivity test --context "<you cluster 1>" --multi-cluster "<you cluster 2>" --force-deploy
cilium connectivity test --context "<you cluster 2>" --multi-cluster "<you cluster 3>" --force-deploy
cilium connectivity test --context "<you cluster 3>" --multi-cluster "<you cluster 1>" --force-deploy

Don’t panic if you see some test failures — they might be related to your network settings and external workload configurations. What matters is that you should see the clustermesh-apiserver pods happily gossiping in Hubble UI:

Press enter or click to view image in full size

Gossiping via port 2379

Pro Tips and Gotchas: 🎯

  1. 🔄 Always restart Cilium components after major changes:
  2. 🔍 Watch Hubble UI for the clustermesh-apiserver gossip
  3. 🎮 Test connectivity between clusters before proceeding

Success Checklist ✅

  • ✅External workloads enabled on all clusters
  • ✅Certificates collected and distributed
  • ✅ClusterMesh configuration applied
  • ✅Cilium components restarted
  • ✅Clusters showing as “ready” in mesh status
  • ✅Hubble showing cross-cluster traffic

Next up in Task 6: Let’s make Bookinfo fully distributed! But first, you might want to check if your laptop fan is still functional… (my 2022 laptop that struggles with CSGO is still hanging in there) 🌪️

Task 6: The Grand Finale

(Or: How We Made Bookinfo Interesting Again!)

Task 1: Prepare the Playgrounds

Task 2: Build the Bridge

Task 3: Deploy the K3s Army

Task 4: Enter Cilium (Again!)

Task 5: Unite the Clusters

Task 6: The Grand Finale ← We are here

Finally! With our ClusterMesh up and running (at least cilium clustermesh status says so), let’s put it through its paces. And what better way than with a distributed application that’ll make our mesh earn its keep?

Hold on! You might be wondering already: what use cases are there by we are doing? Answer: https://cilium.io/blog/2019/03/12/clustermesh/ (I know it’s a 5-year-old blog post, but it’s still relevant!). The combination of these use cases can fit into most architectures for private/hybrid/multi-cloud environments (that smell like our career to support AI gansters…).

Cilium’s Multi-Cluster Service Discovery: The Foundation

Before we dive into breaking down our application, let’s understand how Cilium handles multi-cluster services. The magic happens through three key annotations, each serving a specific purpose (https://docs.cilium.io/en/stable/network/clustermesh/services/):

Press enter or click to view image in full size

The key takeaway? Add service.cilium.io/global: “true” to your services, and Cilium will handle the cross-cluster load balancing magic. Just remember: each cluster needs a matching service definition, even if it’s not running any local pods. (And no, unfortunately, there isn’t a 1–800-CILIUM hotline for when things go wrong! 😉)

Breaking Down Bookinfo 📚

Remember when we used the OpenTelemetry demo in our previous blog? Well, trying to split that across three clusters turned out to be about as fun as debugging a production issue at 3 AM. So, we’re returning to our old friend, the Istio Bookinfo application. (Sorry for calling you boring earlier, Bookinfo — you’re about to become a distributed superstar!)

Now, let’s check below for the architecture:

Press enter or click to view image in full size

https://istio.io/latest/docs/examples/bookinfo/

Examining the structure of the Bookinfo application, we can naturally divide it into three parts for our demonstration. Here’s how we’ll approach this breakdown:

Press enter or click to view image in full size

then, our distributed Bookinfo will look like this:

Press enter or click to view image in full size

As you can see, each cluster interacts with the others, truly exemplifying the interconnected mesh network we’ve been building! To put this design into action, we need to carefully arrange the components within the original Bookinfo YAML file, as shown below:

And, for the grand reveal… No, I won’t paste all 300+ lines here (but I do paste the link to the YAML here: https://github.com/jsccjj/blogs/blob/main/Blog3/configs/modified-bookinfo-all-in-one/bookinfo-clustermesh.yaml)—subjecting you to that would be cruel. Instead, slice and distribute the relevant parts to their respective clusters.

If you’re uncertain, think “Where does productpage belong?” (Cluster 1), “Where do the reviews run?” (Cluster 2), and “Where do ratings and details live?” (Cluster 3). Then annotate the services for global visibility, cross your fingers, and apply the manifests.

And Then… It Just Works

After deployment, access the Product Page service from Cluster 1. If everything’s working correctly, you’ll see:

  • Product information (from Cluster 3)
  • Reviews (from Cluster 2)
  • Ratings (from Cluster 3, via Cluster 2)

Press enter or click to view image in full size

Observability: The Proof is in the Pudding

Open up Hubble UI and you’ll see the cross-cluster traffic flowing:

  • Requests from Product Page to Reviews crossing clusters
  • Reviews service reaching out to Ratings
  • Details service responding to Product Page requests

Press enter or click to view image in full size

Pro Tip: Keep watching Hubble UI — it’s your best friend for understanding cross-cluster traffic patterns!

Success Checklist ✅

Before we declare victory, let’s check our achievements:

  • ✅ Services properly annotated for global access
  • ✅ Endpoints visible across clusters
  • ✅ Traffic flowing through expected paths
  • ✅ Hubble showing cross-cluster connections
  • ✅ All Bookinfo features functional
  • ✅ Service discovery working as expected
  • ✅ Your laptop fan still spinning (if not it means Azure is ka-ching!)

Troubleshooting Tips

If things aren’t working as expected:

  1. Check service annotations — missing global: “true” is a common gotcha
  2. Verify ClusterMesh status on all clusters
  3. Look for connection issues in Hubble UI
  4. Ensure service names and ports match across clusters

Remember: In distributed systems, the problem is usually either networking, DNS, or… both.

Next up: Time for the conclusion, where we’ll reflect on this beautiful mess we’ve created!

Conclusion

(Or: What We Learned from This Totally Reasonable Adventure)

Behold the hybrid marvel we’ve created: a WireGuard-wielding, laptop-straining, multi-cluster labyrinth that seamlessly connects our dining-room desk with the cloud’s lofty domains.

What We Actually Built 🏗️

After this journey through eBPF, GitOps, WireGuard tunnels, and far too many YAML files, we’ve created something equal parts impressive and alarming (depending on who you ask):

A Hybrid Cloud Mesh

  • Three Kubernetes clusters acting as one
  • A secure WireGuard bridge between home and cloud
  • Cross-cluster service discovery and load balancing
  • All while sparing your wallet from Azure’s pricier VPN offerings

A Production-ish Setup (Not Really Yet)

  • GitOps-driven deployments via Argo CD
  • Proper certificates (no shady TLS nightmares)
  • Observability with Hubble for cross-cluster fireworks
  • Actually useful service segmentation (imagine that!)

Our Sanity Check List

  • Cost-effective (WireGuard over Azure VPN Gateway)
  • Secure cluster-to-cluster chatter
  • Observable traffic flows
  • Manageable configurations
  • Reasonable laptop resource usage (but hey, we do this for fun!)

Key Takeaways 🎓

About Cilium ClusterMesh

  • Incredibly powerful but demands thorough planning
  • Pod CIDR assignments are crucial (you don’t want them overlapping)
  • Certificates can make or break your setup
  • Multi-cluster service discovery requires more brainpower than you’d think

About GitOps

  • Configuration tracking is the ultimate safety net
  • Deployments become reproducible and reversible (you’ll thank yourself later)
  • Even mistakes are versioned (so you can pinpoint exactly who typed what and when)
  • A quick Git diff often solves more questions than a thousand Slack threads

About Hybrid Setups

  • WireGuard remains a champion for budget-friendly home labs
  • K3s is a lightweight gem that spares your CPU
  • Modern tooling can tame even the most sprawling cluster mesh
  • Azure doesn’t have to eat your wallet, so long as you keep an eye on resources

What’s Next? 🚀

Yes, this setup is overkill for most people, but it showcases some exciting possibilities:

  • Multi-region deployments
  • Cloud bursting
  • Disaster recovery
  • Edge computing
  • Or just impressing your colleagues with your unstoppable tinkering

Remember: just because you can run three Kubernetes clusters on your laptop doesn’t mean you should — but it’s undeniably cool that you did!

Acknowledgments

  • My laptop’s fan for not collapsing under pressure
  • The coffee machine for fueling this entire expedition
  • Cilium’s team for their eBPF wizardry
  • K3s for staying featherlight on resources
  • Azure for not draining my bank account (yet)

And if anyone asks why you did this, just grin and say it was all “for research purposes.” It sounds far more official than “because I felt like it.”

Next time: maybe we’ll tackle hybrid WASM (even more friendly for my laptop) on Kubernetes — or maybe it’s time to take a vacation first. 🏖️