18 minute read

Steps to set up a local environment with OVN Kubernetes Multiple External Gateway capability (also known as Intelligent CNI 2.0 or iCNI2.0) and a FRRRouting (FRR) pod acting as external gateway.

Kind setup

First let us create a local kind cluster with ONV-Kubernetes as CNI:

$ git clone https://github.com/ovn-org/ovn-kubernetes.git
$ cd ovn-kubernetes/contrib
$ ./kind.sh --disable-snat-multiple-gws --multi-network-enable

Let’s take a look at the options:

  • disable-snat-multiple-gws: Disable SNAT for multiple gws
  • multi-network-enable: Installs Multus-CNI on the cluster

After some minutes, we will have a three node cluster ready for use:

$ export KUBECONFIG=$HOME/ovn.conf
$ kubectl get node
NAME                STATUS   ROLES           AGE    VERSION
ovn-control-plane   Ready    control-plane   4h2m   v1.24.0
ovn-worker          Ready    <none>          4h1m   v1.24.0
ovn-worker2         Ready    <none>          4h1m   v1.24.0

Let’s install some additional CNI networks plugins needed for the test (i.e.: macvlan):

$ git clone https://github.com/containernetworking/plugins.git
$ cd plugins
$ ./build_linux.sh
$ cd bin
$ for i in $(docker ps -aq); do for j in macvlan static tuning; do docker cp $j $i:/opt/cni/bin/; done; done

Alternativally, if you don’t want to manually compile the CNI plugins, submitted a PR that adds an option to deploy additional CNI plugins (i.e.: macvlan, ipvlan, etc.) to the script kind.sh:

$ wget https://raw.githubusercontent.com/ovn-org/ovn-kubernetes/0123ad42d371223dc434b6af06a9ea4fd8336cda/contrib/kind.sh
$ ./kind.sh --install-cni-plugins --disable-snat-multiple-gws --multi-network-enable

Let’s take a look at the new option introduced by the PR:

  • install-cni-plugins: Installs additional CNI network plugins

Resource creation

Let’s create the namespaces:

$ cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: Namespace
metadata:
  name: frr
spec: {}
---
apiVersion: v1
kind: Namespace
metadata:
  name: bar
spec: {}
EOF

Let’s create the network attachment definitions:

$ cat <<EOF | kubectl apply -f -
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: internal-net
  namespace: frr
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "internal-net",
      "plugins": [
        {
          "type": "macvlan",
          "master": "breth0",
          "mode": "bridge",
          "ipam": {
            "type": "static"
          }
        },
        {
          "capabilities": {
            "mac": true,
            "ips": true
          },
          "type": "tuning"
        }
      ]
    }
EOF

Check for correct creation:

$ kubectl get net-attach-def -n frr
NAME           AGE
internal-net   2m

Create a dummy pod on the served namespace (bar) on the second worker:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy
  namespace: bar
spec:
  containers:
  - name: dummy
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

Let’s wait for the pod:

$ kubectl get po -n bar
NAME     READY   STATUS    RESTARTS   AGE
dummy    1/1     Running   0          2m

Let’s grab two important pieces of information for the FRR configuration, the IP of the ovn-worker2 node where the dummy pod resides (for the BFD peer) and the static routes entries for nodes subnets:

$ kubectl get node -o wide | grep ovn-worker2
ovn-worker2         Ready    <none>          11m   v1.24.0   172.18.0.4    <none>        Ubuntu 21.10   6.0.7-301.fc37.x86_64   containerd://1.6.4

$ kubectl get nodes -o jsonpath='{range .items[*].metadata.annotations}{.k8s\.ovn\.org\/node\-subnets}{.k8s\.ovn\.org\/node\-primary\-ifaddr}{"\n"}{end}' | awk -F'["/]' '{print "ip route " $4"/"$5 " " $9}'
ip route 10.244.0.0/24 172.18.0.3
ip route 10.244.2.0/24 172.18.0.2
ip route 10.244.1.0/24 172.18.0.4

Let’s create the FRR configuration:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  name: frr-configs
  namespace: frr
data:
  daemons: |
    bgpd=yes
    ospfd=no
    ospf6d=no
    ripd=no
    ripngd=no
    isisd=no
    pimd=no
    ldpd=no
    nhrpd=no
    eigrpd=no
    babeld=no
    sharpd=no
    pbrd=no
    bfdd=yes
    fabricd=no
    vrrpd=no
    vtysh_enable=yes
    zebra_options="  -A 127.0.0.1 -s 90000000"
    bgpd_options="   -A 127.0.0.1"
    ospfd_options="  -A 127.0.0.1"
    ospf6d_options=" -A ::1"
    ripd_options="   -A 127.0.0.1"
    ripngd_options=" -A ::1"
    isisd_options="  -A 127.0.0.1"
    pimd_options="   -A 127.0.0.1"
    ldpd_options="   -A 127.0.0.1"
    nhrpd_options="  -A 127.0.0.1"
    eigrpd_options=" -A 127.0.0.1"
    babeld_options=" -A 127.0.0.1"
    sharpd_options=" -A 127.0.0.1"
    pbrd_options="   -A 127.0.0.1"
    staticd_options="-A 127.0.0.1"
    bfdd_options="   -A 127.0.0.1"
    fabricd_options="-A 127.0.0.1"
    vrrpd_options="  -A 127.0.0.1"
  vtysh.conf: |
    service integrated-vtysh-config
  frr.conf: |
    hostname vrouter
    service integrated-vtysh-config
    password frr
    enable password frr
    !
    debug bfd peer
    debug bfd zebra
    debug bfd network
    !
    bfd
     peer 172.18.0.4
       no shutdown
     !
    !
    ! subnets for each node
    ip route 10.244.0.0/24 172.18.0.3
    ip route 10.244.2.0/24 172.18.0.2
    ip route 10.244.1.0/24 172.18.0.4
    !
    log file /tmp/frr.log debugging
EOF

Finally let’s create the FRR pod:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: ext-gw
  namespace: frr
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name": "internal-net",
          "ips": [ "172.18.0.10/16" ]
        }
      ]'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "frr/internal-net",
          "ips": [
              "172.18.0.10"
          ],
          "dns": {}
      }]
    k8s.ovn.org/routing-namespaces: "bar"
    k8s.ovn.org/bfd-enabled: ""
    k8s.ovn.org/routing-network: "frr/internal-net"
spec:
  containers:
  - name: frr
    image: quay.io/wcaban/frr
    command: ["/bin/sh","-c"]
    args: ["/usr/libexec/frr/frrinit.sh start && tail -f /tmp/frr.log "]
    ports:
    - name: bfd
      containerPort: 3784
      protocol: UDP
    - name: bgp
      containerPort: 179
      protocol: TCP
    - name: rip
      containerPort: 520
      protocol: UDP
    - name: ripng
      containerPort: 521
      protocol: UDP
    - name: stats
      containerPort: 9000
      protocol: TCP
    securityContext:
      privileged: true
    volumeMounts:
    - name: config-volume
      mountPath: /etc/frr
  volumes:
    - name: config-volume
      configMap:
        name: frr-configs
  nodeSelector:
    kubernetes.io/hostname: ovn-worker
EOF

Let’s wait for the pod:

$ kubectl get po -n frr
NAME     READY   STATUS    RESTARTS   AGE
ext-gw   1/1     Running   0          2m

Environment check

The pod should have two interfaces, routes properly configured, be able to reach ovn-worker2 and a BFD session established with the node ovn-worker2:

$ kubectl exec -n frr -it ext-gw -- sh
sh-5.1# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0@if95: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
    link/ether 0a:58:0a:f4:02:11 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.2.17/24 brd 10.244.2.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::858:aff:fef4:211/64 scope link
       valid_lft forever preferred_lft forever
3: net1@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 5a:1d:47:09:44:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.10/16 brd 172.18.255.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::581d:47ff:fe09:448e/64 scope link
       valid_lft forever preferred_lft forever

sh-5.1# ip r
default via 10.244.2.1 dev eth0
10.244.0.0/24 nhid 15 via 172.18.0.3 dev net1 proto 196 metric 20
10.244.1.0/24 nhid 16 via 172.18.0.4 dev net1 proto 196 metric 20
10.244.2.0/24 dev eth0 proto kernel scope link src 10.244.2.6
172.18.0.0/16 dev net1 proto kernel scope link src 172.18.0.10

sh-5.1# ping -c1 172.18.0.4
PING 172.18.0.4 (172.18.0.4) 56(84) bytes of data.
64 bytes from 172.18.0.4: icmp_seq=1 ttl=64 time=0.276 ms
--- 172.18.0.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.276/0.276/0.276/0.000 ms

sh-5.1# vtysh
Hello, this is FRRouting (version 8.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
ext-gw# show bfd peers brief
Session count: 1
SessionId  LocalAddress                             PeerAddress                             Status
=========  ============                             ===========                             ======
1866236061 172.18.0.10                              172.18.0.4                              up

Let’s check FRR logs:

$ oc logs ext-gw -n frr
Started watchfrr
2023/02/23 10:34:31 ZEBRA: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 BGP: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 STATIC: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:31 BFD: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
2023/02/23 10:34:33 BFD: [J1C6V-VMRW5] state-change: [mhop:no peer:172.18.0.4 local:0.0.0.0 vrf:default] init -> up

Let’s check everything was properly created from the OVN perspective:

$ POD=$(kubectl get pod -n ovn-kubernetes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | grep ovnkube-db-) ; kubectl exec -ti $POD -n ovn-kubernetes -c nb-ovsdb -- bash

[root@ovn-control-plane ~]# ovn-nbctl list bfd
_uuid               : 6a5f6a73-df1f-4114-b36a-745cf3e9123b
detect_mult         : []
dst_ip              : "172.18.0.10"
external_ids        : {}
logical_port        : exgw-rtoe-GR_ovn-worker2
min_rx              : []
min_tx              : []
options             : {}
status              : up

[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker2
IPv4 Routes
Route Table <main>:
               10.244.1.3               172.18.0.10 src-ip exgw-rtoe-GR_ovn-worker2 ecmp-symmetric-reply bfd
         169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker2
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worker2

[root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker
IPv4 Routes
Route Table <main>:
         169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_ovn-worker
            10.244.0.0/16                100.64.0.1 dst-ip
                0.0.0.0/0                172.18.0.1 dst-ip rtoe-GR_ovn-worker

Let’s add a loopback address to the ext-gw pod to test the source routing entry:

$ kubectl exec -n frr ext-gw -- ip a a 192.168.1.10/32 dev lo

Validate the dummy pod can reach the network running in ext-gw container’s looback address:

$ kubectl exec -n bar dummy -- ping -c 1 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.
64 bytes from 192.168.1.10: icmp_seq=1 ttl=62 time=6.96 ms

--- 192.168.1.10 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 6.959/6.959/6.959/0.000 ms

Let’s create a “normal” pod in the default namespace (outside the routing-namespaces annotated namespaces):

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: normal
spec:
  containers:
  - name: normal
    image: centos
    command:
      - sleep
      - infinity
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF

It should not be able to reach the network running in ext-gw container’s looback address:

$ kubectl exec normal -- ping -c 1 192.168.1.10
PING 192.168.1.10 (192.168.1.10) 56(84) bytes of data.

--- 192.168.1.10 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

command terminated with exit code 1

Pod readiness probes

Now let’s convert this into pod readiness probes, both for the working and non-working cases. Let’s create another pair of dummy and “normal” pods.

Let’s start with the second dummy pod:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dummy2
  namespace: bar
spec:
  containers:
  - name: dummy2
    image: centos
    command:
      - sleep
      - infinity
    securityContext:
      privileged: true
    readinessProbe:
      exec:
        command:
          - ping
          - -c1
          - 192.168.1.10
      initialDelaySeconds: 5
      periodSeconds: 5
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
$ kubectl get po -n bar
NAME     READY   STATUS    RESTARTS   AGE
dummy    1/1     Running   0          115m
dummy2   1/1     Running   0          5m12s

$ kubectl describe po dummy2 -n bar | grep Ready
    Ready:          True
  Ready             True
  ContainersReady   True

Finally, the second “normal” pod:

$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: normal2
spec:
  containers:
  - name: normal2
    image: centos
    command:
      - sleep
      - infinity
    readinessProbe:
      exec:
        command:
        - ping
        - -c1
        - 192.168.1.10
  nodeSelector:
    kubernetes.io/hostname: ovn-worker2
EOF
$ kubectl get po
NAME      READY   STATUS    RESTARTS   AGE
normal    1/1     Running   0          95m
normal2   0/1     Running   0          5m48s

$ kubectl describe po normal2 | grep Ready
    Ready:          False
  Ready             False
  ContainersReady   False

$ kubectl describe po normal2
...
  Warning  Unhealthy       8s (x4 over 15s)  kubelet            Readiness probe failed: command "ping -c1 192.168.1.10" timed out

Lety’s also add a pod readiness probe to the FRR pods to check over the BFD session establishment:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: ext-gw2
  namespace: frr
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
        {
          "name": "internal-net",
          "ips": [ "172.18.0.11/16" ]
        }
      ]'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "frr/internal-net",
          "ips": [
              "172.18.0.11"
          ],
          "dns": {}
      }]
    k8s.ovn.org/routing-namespaces: "bar"
    k8s.ovn.org/bfd-enabled: ""
    k8s.ovn.org/routing-network: "frr/internal-net"
spec:
  containers:
  - name: frr
    image: quay.io/wcaban/frr
    command: ["/bin/sh","-c"]
    args: ["/usr/libexec/frr/frrinit.sh start && tail -f /tmp/frr.log "]
    ports:
    - name: bfd
      containerPort: 3784
      protocol: UDP
    - name: bgp
      containerPort: 179
      protocol: TCP
    - name: rip
      containerPort: 520
      protocol: UDP
    - name: ripng
      containerPort: 521
      protocol: UDP
    - name: stats
      containerPort: 9000
      protocol: TCP
    securityContext:
      privileged: true
    volumeMounts:
    - name: config-volume
      mountPath: /etc/frr
    readinessProbe:
      exec:
        command:
          - sh
          - -c
          - >-
             vtysh -c 'show bfd peers brief' |
             grep up
      initialDelaySeconds: 5
      periodSeconds: 5
  volumes:
    - name: config-volume
      configMap:
        name: frr-configs
  nodeSelector:
    kubernetes.io/hostname: ovn-worker
EOF
$ kubectl get po -n frr
NAME      READY   STATUS    RESTARTS   AGE
ext-gw    1/1     Running   0          46m
ext-gw2   1/1     Running   0          5m10s

$ kubectl describe po ext-gw2 -n frr | grep Ready
    Ready:          True
  Ready             True
  ContainersReady   True

Comments