OVN-Kubernetes Multiple External Gateway local setup
Steps to set up a local environment with OVN Kubernetes Multiple External Gateway capability (also known as Intelligent CNI 2.0 or iCNI2.0) and a FRRRouting (FRR) pod acting as external gateway.
Kind setup
First let us create a local kind cluster with ONV-Kubernetes as CNI:
$ git clone https://github.com/ovn-org/ovn-kubernetes.git
$ cd ovn-kubernetes/contrib
$ ./kind.sh --disable-snat-multiple-gws --multi-network-enable
Let’s take a look at the options:
disable-snat-multiple-gws
: Disable SNAT for multiple gwsmulti-network-enable
: Installs Multus-CNI on the cluster
After some minutes, we will have a three node cluster ready for use:
$ export KUBECONFIG=$HOME/ovn.conf
$ kubectl get node
NAME STATUS ROLES AGE VERSION
ovn-control-plane Ready control-plane 4h2m v1.24.0
ovn-worker Ready <none> 4h1m v1.24.0
ovn-worker2 Ready <none> 4h1m v1.24.0
Let’s install some additional CNI networks plugins needed for the test (i.e.: macvlan):
$ git clone https://github.com/containernetworking/plugins.git
$ cd plugins
$ ./build_linux.sh
$ cd bin
$ for i in $(docker ps -aq); do for j in macvlan static tuning; do docker cp $j $i:/opt/cni/bin/; done; done
Alternativally, if you don’t want to manually compile the CNI plugins, submitted a PR that adds an option to deploy additional CNI plugins (i.e.: macvlan, ipvlan, etc.) to the script kind.sh
:
$ wget https://raw.githubusercontent.com/ovn-org/ovn-kubernetes/0123ad42d371223dc434b6af06a9ea4fd8336cda/contrib/kind.sh
$ ./kind.sh --install-cni-plugins --disable-snat-multiple-gws --multi-network-enable
Let’s take a look at the new option introduced by the PR:
install-cni-plugins
: Installs additional CNI network plugins
Resource creation
Let’s create the namespaces:
$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
name: frr
spec: {}
---
apiVersion: v1
kind: Namespace
metadata:
name: bar
spec: {}
EOF
Let’s create the network attachment definitions:
$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: internal-net
namespace: frr
spec:
config: |-
{
"cniVersion": "0.3.1",
"name": "internal-net",
"plugins": [
{
"type": "macvlan",
"master": "breth0",
"mode": "bridge",
"ipam": {
"type": "static"
}
},
{
"capabilities": {
"mac": true,
"ips": true
},
"type": "tuning"
}
]
}
EOF
Check for correct creation:
$ kubectl get net-attach-def -n frr
NAME AGE
internal-net 2m
Create a dummy pod on the served namespace (bar) on the second worker:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: dummy
namespace: bar
spec:
containers:
- name: dummy
image: centos
command:
- sleep
- infinity
nodeSelector:
kubernetes.io/hostname: ovn-worker2
EOF
Let’s wait for the pod:
$ kubectl get po -n bar
NAME READY STATUS RESTARTS AGE
dummy 1/1 Running 0 2m
Let’s grab two important pieces of information for the FRR configuration, the IP of the ovn-worker2 node where the dummy pod resides (for the BFD peer) and the static routes entries for nodes subnets:
$ kubectl get node -o wide | grep ovn-worker2
ovn-worker2 Ready <none> 11m v1.24.0 172.18.0.4 <none> Ubuntu 21.10 6.0.7-301.fc37.x86_64 containerd://1.6.4
$ kubectl get nodes -o jsonpath='{range .items[*].metadata.annotations}{.k8s\.ovn\.org\/node\-subnets}{.k8s\.ovn\.org\/node\-primary\-ifaddr}{"\n"}{end}' | awk -F'["/]' '{print "ip route " $4"/"$5 " " $9}'
ip route 10.244.0.0/24 172.18.0.3
ip route 10.244.2.0/24 172.18.0.2
ip route 10.244.1.0/24 172.18.0.4
Let’s create the FRR configuration:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: frr-configs
namespace: frr
data:
daemons: |
bgpd=yes
ospfd=no
ospf6d=no
ripd=no
ripngd=no
isisd=no
pimd=no
ldpd=no
nhrpd=no
eigrpd=no
babeld=no
sharpd=no
pbrd=no
bfdd=yes
fabricd=no
vrrpd=no
vtysh_enable=yes
zebra_options=" -A 127.0.0.1 -s 90000000"
bgpd_options=" -A 127.0.0.1"
ospfd_options=" -A 127.0.0.1"
ospf6d_options=" -A ::1"
ripd_options=" -A 127.0.0.1"
ripngd_options=" -A ::1"
isisd_options=" -A 127.0.0.1"
pimd_options=" -A 127.0.0.1"
ldpd_options=" -A 127.0.0.1"
nhrpd_options=" -A 127.0.0.1"
eigrpd_options=" -A 127.0.0.1"
babeld_options=" -A 127.0.0.1"
sharpd_options=" -A 127.0.0.1"
pbrd_options=" -A 127.0.0.1"
staticd_options="-A 127.0.0.1"
bfdd_options=" -A 127.0.0.1"
fabricd_options="-A 127.0.0.1"
vrrpd_options=" -A 127.0.0.1"
vtysh.conf: |
service integrated-vtysh-config
frr.conf: |
hostname vrouter
service integrated-vtysh-config
password frr
enable password frr
!
debug bfd peer
debug bfd zebra
debug bfd network
!
bfd
peer 172.18.0.4
no shutdown
!
!
! subnets for each node
ip route 10.244.0.0/24 172.18.0.3
ip route 10.244.2.0/24 172.18.0.2
ip route 10.244.1.0/24 172.18.0.4
!
log file /tmp/frr.log debugging
EOF
Finally let’s create the FRR pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ext-gw
namespace: frr
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "internal-net",
"ips": [ "172.18.0.10/16" ]
}
]'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "frr/internal-net",
"ips": [
"172.18.0.10"
],
"dns": {}
}]
k8s.ovn.org/routing-namespaces: "bar"
k8s.ovn.org/bfd-enabled: ""
k8s.ovn.org/routing-network: "frr/internal-net"
spec:
containers:
- name: frr
image: quay.io/wcaban/frr
command: ["/bin/sh","-c"]
args: ["/usr/libexec/frr/frrinit.sh start && tail -f /tmp/frr.log "]
ports:
- name: bfd
containerPort: 3784
protocol: UDP
- name: bgp
containerPort: 179
protocol: TCP
- name: rip
containerPort: 520
protocol: UDP
- name: ripng
containerPort: 521
protocol: UDP
- name: stats
containerPort: 9000
protocol: TCP
securityContext:
privileged: true
volumeMounts:
- name: config-volume
mountPath: /etc/frr
volumes:
- name: config-volume
configMap:
name: frr-configs
nodeSelector:
kubernetes.io/hostname: ovn-worker
EOF
Let’s wait for the pod:
$ kubectl get po -n frr
NAME READY STATUS RESTARTS AGE
ext-gw 1/1 Running 0 2m
Environment check
The pod should have two interfaces, routes properly configured, be able to reach ovn-worker2 and a BFD session established with the node ovn-worker2:
$ kubectl exec -n frr -it ext-gw -- sh
sh-5.1# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0@if95: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
link/ether 0a:58:0a:f4:02:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.244.2.17/24 brd 10.244.2.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::858:aff:fef4:211/64 scope link
valid_lft forever preferred_lft forever
3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 5a:1d:47:09:44:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.18.0.10/16 brd 172.18.255.255 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::581d:47ff:fe09:448e/64 scope link
valid_lft forever preferred_lft forever
sh-5.1# ip r
default via 10.244.2.1 dev eth0
10.244.0.0/24 nhid 15 via 172.18.0.3 dev net1 proto 196 metric 20
10.244.1.0/24 nhid 16 via 172.18.0.4 dev net1 proto 196 metric 20
10.244.2.0/24 dev eth0 proto kernel scope link src 10.244.2.6
172.18.0.0/16 dev net1 proto kernel scope link src 172.18.0.10
sh-5.1# ping -c1 172.18.0.4
PING 172.18.0.4 (172.18.0.4) 56(84) bytes of data.
64 bytes from 172.18.0.4: icmp_seq=1 ttl=64 time=0.276 ms
--- 172.18.0.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.276/0.276/0.276/0.000 ms
sh-5.1# vtysh
Hello, this is FRRouting (version 8.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
ext-gw# show bfd peers brief
Session count: 1
SessionId LocalAddress PeerAddress Status
========= ============ =========== ======
1866236061 172.18.0.10 172.18.0.4 up
Let’s check FRR logs:
$ oc logs ext-gw -n frr
Started watchfrr
2023/02/23 10:34:31 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 BGP: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 STATIC: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 BFD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:33 BFD: [J1C6V-VMRW5] state-change: [mhop:no peer:172.18.0.4 local:0.0.0.0 vrf:default] init -> up
Let’s check everything was properly created from the OVN perspective:
$ POD=$(kubectl get pod -n ovn-kubernetes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ovnkube-db-) ; kubectl exec -ti $POD -n ovn-kubernetes -c nb-ovsdb -- bash
[root@ovn-control-plane ~]# ovn-nbctl list bfd
_uuid : 6a5f6a73-df1f-4114-b36a-745cf3e9123b
detect_mult : []
dst_ip : "172.18.0.10"
external_ids : {}
logical_port : exgw-rtoe-GR_ovn-worker2
min_rx : []
min_tx : []
options : {}
status : up
[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
10.244.1.3 172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
169.254.169.0/29 169.254.169.4 dst-ip rtoe-GR_ovn-worker2
10.244.0.0/16 100.64.0.1 dst-ip
0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_ovn-worker2
[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker
IPv4 Routes
Route Table <main>:
169.254.169.0/29 169.254.169.4 dst-ip rtoe-GR_ovn-worker
10.244.0.0/16 100.64.0.1 dst-ip
0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_ovn-worker
Let’s add a loopback address to the ext-gw pod to test the source routing entry:
$ kubectl exec -n frr ext-gw -- ip a a 192.168.1.10/32 dev lo
Validate the dummy pod can reach the network running in ext-gw container’s looback address:
$ kubectl exec -n bar dummy -- ping -c 1 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
64 bytes from 192.168.1.10: icmp_seq=1 ttl=62 time=6.96 ms
--- 192.168.1.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.959/6.959/6.959/0.000 ms
Let’s create a “normal” pod in the default namespace (outside the routing-namespaces
annotated namespaces):
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: normal
spec:
containers:
- name: normal
image: centos
command:
- sleep
- infinity
nodeSelector:
kubernetes.io/hostname: ovn-worker2
EOF
It should not be able to reach the network running in ext-gw container’s looback address:
$ kubectl exec normal -- ping -c 1 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
--- 192.168.1.10 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
command terminated with exit code 1
Pod readiness probes
Now let’s convert this into pod readiness probes, both for the working and non-working cases. Let’s create another pair of dummy and “normal” pods.
Let’s start with the second dummy pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: dummy2
namespace: bar
spec:
containers:
- name: dummy2
image: centos
command:
- sleep
- infinity
securityContext:
privileged: true
readinessProbe:
exec:
command:
- ping
- -c1
- 192.168.1.10
initialDelaySeconds: 5
periodSeconds: 5
nodeSelector:
kubernetes.io/hostname: ovn-worker2
$ kubectl get po -n bar
NAME READY STATUS RESTARTS AGE
dummy 1/1 Running 0 115m
dummy2 1/1 Running 0 5m12s
$ kubectl describe po dummy2 -n bar | grep Ready
Ready: True
Ready True
ContainersReady True
Finally, the second “normal” pod:
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: normal2
spec:
containers:
- name: normal2
image: centos
command:
- sleep
- infinity
readinessProbe:
exec:
command:
- ping
- -c1
- 192.168.1.10
nodeSelector:
kubernetes.io/hostname: ovn-worker2
EOF
$ kubectl get po
NAME READY STATUS RESTARTS AGE
normal 1/1 Running 0 95m
normal2 0/1 Running 0 5m48s
$ kubectl describe po normal2 | grep Ready
Ready: False
Ready False
ContainersReady False
$ kubectl describe po normal2
...
Warning Unhealthy 8s (x4 over 15s) kubelet Readiness probe failed: command "ping -c1 192.168.1.10" timed out
Lety’s also add a pod readiness probe to the FRR pods to check over the BFD session establishment:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ext-gw2
namespace: frr
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "internal-net",
"ips": [ "172.18.0.11/16" ]
}
]'
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "frr/internal-net",
"ips": [
"172.18.0.11"
],
"dns": {}
}]
k8s.ovn.org/routing-namespaces: "bar"
k8s.ovn.org/bfd-enabled: ""
k8s.ovn.org/routing-network: "frr/internal-net"
spec:
containers:
- name: frr
image: quay.io/wcaban/frr
command: ["/bin/sh","-c"]
args: ["/usr/libexec/frr/frrinit.sh start && tail -f /tmp/frr.log "]
ports:
- name: bfd
containerPort: 3784
protocol: UDP
- name: bgp
containerPort: 179
protocol: TCP
- name: rip
containerPort: 520
protocol: UDP
- name: ripng
containerPort: 521
protocol: UDP
- name: stats
containerPort: 9000
protocol: TCP
securityContext:
privileged: true
volumeMounts:
- name: config-volume
mountPath: /etc/frr
readinessProbe:
exec:
command:
- sh
- -c
- >-
vtysh -c 'show bfd peers brief' |
grep up
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config-volume
configMap:
name: frr-configs
nodeSelector:
kubernetes.io/hostname: ovn-worker
EOF
$ kubectl get po -n frr
NAME READY STATUS RESTARTS AGE
ext-gw 1/1 Running 0 46m
ext-gw2 1/1 Running 0 5m10s
$ kubectl describe po ext-gw2 -n frr | grep Ready
Ready: True
Ready True
ContainersReady True
Comments