I recently came across a LinkedIn post talking about the above concepts and its trivialness to setup. The goal: Use Cilium's BGP capabilities to either expose a service or export the pod cidr and advertise its range to a peer. We are all on different chapters of our life's book, so I wanted to explain the setup a little more in order to possibly help someone out there add a feather to their hat!
Why would you want to expose a pod network directly using BGP? The concepts are relatively simple. The idea of ToR or top of rack is the idea that in the data center a rack has multiple servers in it and at the top is a switch that they all connect to. Then off to an aggregate it goes. In this scenario, we have no load balancers in between as Kubernetes is keen to do, nor do we need to expose node ports. Just straight connections via advertised routes directly to the applications. Why set this up at home? Its likely that any services you may be running are one-offs serving things like Plex, Pihole, etc. This makes it incredibly easy to connect to the applications directly.
The Setup
In my setup I will be using FRR on a UDM-SE and a Raspberry Pi running K3S and Cilium. Feel free to use a standalone nix box to setup FRR but know that you will also need to add some static routes to it. For my UDM, the FRR package was already installed! Under the hood, Ubiquiti uses it for its Magic VPN feature. I don't use it, so it was straight forward to enable the systemd service with my own custom configuration I will show below. For more details, Chris's blog here can show you everything you need to do.
hostname UDM-SE
frr defaults datacenter
log file stdout
service integrated-vtysh-config
!
!
router bgp 65001
bgp router-id 192.168.120.254
neighbor 192.168.120.11 remote-as 65000 #raspberry pi
neighbor 192.168.120.11 default-originate #raspberry pi
!
address-family ipv4 unicast
redistribute connected
redistribute kernel
neighbor V4 soft-reconfiguration inbound
neighbor V4 route-map ALLOW-ALL in
neighbor V4 route-map ALLOW-ALL out
exit-address-family
!
route-map ALLOW-ALL permit 10
!
line vty
!
The above is my configuration for FRR. Straight forward besides my comments about where the raspberry pi single node k3s lives.
Now its on to Cilium and K3s. Note that you will need to disable flannel, servicelb, and network-policy. You can do this with a fresh install, setting environment variables, or by editing the systemd service. If you are running this on an existing installation, you will likely also need to remove the flannel vxlan. Run ip link show
to verify its presence if you encounter a crash loop with Cilium.
ExecStart=/usr/local/bin/k3s \
server \
'--flannel-backend=none' \
'--disable-network-policy' \
'--disable=servicelb' \
Installing cilium with the binary and single flag for this use case is as follows cilium install --set bgpControlPlane.enabled=true
. After a successful installation, its time to create the CiliumBGPPeeringPolicy
. Below is my example with notes.
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
name: custom-policy
spec:
virtualRouters:
- exportPodCIDR: true # allows the pod CIDR to be advertised
localASN: 65000
neighbors:
- connectRetryTimeSeconds: 120
eBGPMultihopTTL: 1
holdTimeSeconds: 90
keepAliveTimeSeconds: 30
peerASN: 65001 #FRR ASN
peerAddress: 192.168.120.1/32 # FRR address
peerPort: 179
Validation
On to validation. From the FRR side I run vtysh -c 'show ip bgp'
and receive
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.0/24 192.168.120.11
From where my Cilium binary is installed with access to my K3s cluster I run cilium bgp peers
and receive
Node Local AS Peer AS Peer Address Session State Uptime Family Received Advertised
pi 65000 65001 192.168.120.1 established 12h49m3s ipv4/unicast 7 1
ipv6/unicast 0 0
From here, if you had flannel installed previously you will likely need to restart the pods in order to get new Cilium CNI range. Run a curl or whatever you see fit and verify its working!
$curl http://10.0.0.83:8080
Hello World!