Kubernetes control plane scale testing with Kubemark

19 minute read

Continuation of Michael McCune (@elmiko) notes on Setting Up a Development Environment for the Cluster API Kubemark Provider, Automating My Hollow Kubernetes Test Rig and DevConf.cz 2022 Testing at Scale with Cluster API and Kubemark (demo).

Kubemark is a performance testing tool which allows users to run experiments on simulated clusters, by creating “hollow” Kubernetes nodes. What this means is that the nodes do not actually run containers or attach storage, but they do behave like they did, with updates to etcd and all the trimmings. At the same time, hollow nodes are extremelly light (<30 MiB).

The primary use case of Kubemark is scalability testing, as simulated clusters can be much bigger than the real ones. The objective is to expose problems with the master components (API server, controller manager or scheduler) that appear only on bigger clusters (e.g. small memory leaks).

Architecture

On a very high level Kubemark cluster consists of two parts: real master components and a set of “hollow” nodes. The prefix hollow means an implementation/instantiation of a component with all moving parts mocked out. The best example is HollowKubelet, which pretends to be an ordinary Kubelet, but does not start anything, nor mount any volumes -it just lies it does.

Currently master components run on a dedicated machine(s), and HollowNodes run on an external management Kubernetes cluster. This design has the advantage of completely isolating master resources from everything else.

Integration with Cluster API

Kubernetes Cluster API (CAPI) is a project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. It uses Kubernetes-style APIs and patterns to automate cluster lifecycle management for platform operators. The supporting infrastructure, like virtual machines, networks, load balancers, and VPCs, as well as the Kubernetes cluster configuration are all defined in the same way that application developers operate deploying and managing their workloads. This enables consistent and repeatable cluster deployments across a wide variety of infrastructure environments.

The Cluster API community has developed a Cluster API Kubemark Provider, allowing users to deploy fake, Kubemark-backed machines to their clusters. This is useful in a variety of scenarios, such load-testing and simulation testing.

Hands to work

On the host docker (we will be using a fresh Ubuntu 22.04 virtual machine) we will use kind (Kubernetes in Docker, a container running the necessary kubernetes pieces) to create the CAPI Management Cluster. Next, we will use the clusterctl tool to create a second cluster (using kind as well) for the Kubemark workload (the cluster under test). Lastly, we want to create new nodes for the Kubemark Control Plane Cluster (the cluster under test) and Kubemark requires that we create these hollow nodes as pods running on a cluster that can join the control plane. The Cluster API Kubemark provider then creates pods within the CAPI Management Cluster which will join the Kubemark Control Plane Cluster (the cluster under test/worload cluster) as nodes.

For the demo we will be using a Ubuntu 22.04 virtual machine with 4 vCPUs, 4 GiB of memory and 100 GiB disk.

Environment setup

I will be using Lima (Linux virtual machines) to create and manage the VM:

$ limactl start --name=ubuntu22.04 template://ubuntu-lts
$ limactl shell ubuntu22.04

We will use cluster-apikubemark-ansible playbooks to automate the deploy of:

  • Golang
  • Build tools
  • Docker
  • Docker local registry
  • Kind
  • Kubectl
  • Kustomize
  • Kubebuilder
  • Cluster API
  • Cluster API Kubemark provider

Prepare the host to run ansible:

  • Install ansible (not ansible-core)
     $ sudo apt install ansible
     $ ssh-keygen
     $ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
  • Clone and and prepare the playbooks:
     $ git clone https://github.com/elmiko/cluster-api-kubemark-ansible.git
     $ cd cluster-api-kubemark-ansible
     $ echo -e "[defaults]\nallow_world_readable_tmpfiles=true" > ansible.cfg
    
  • Update inventory/hosts if you need to change addresses and/or users and run the first playbook:
     $ ansible-playbook -i inventory/hosts setup_devel_environment.yaml 
    

    Once it is finished you will be able to login to the host as the devel user listed in the hosts file. All the development tools should be ready for access.

  • Run the second playbook to build the clusterctl binary, all the controller images and push the images to the local registry.
     ansible-playbook -i inventory/hosts build_clusterctl_and_images.yaml  
    

Creating the cluster

We will use the capi-hacks repo playbooks to aid with Kubemark deployment.

Ensure the docker local registry was created in the previous steps, if not use the 00-start-localregistry.sh script:

$ docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED      STATUS          PORTS                                  NAMES
7064a4208e15   registry:2             "/entrypoint.sh /etc…"   4 days ago   Up 46 minutes   127.0.0.1:5000->5000/tcp               kind-registry

Clone the capi-hacks repo:

$ git clone https://github.com/elmiko/capi-hacks.git
$ cd capi-hacks

Create the CAPI management cluster. This cluster will host the CAPI components and Kubemark hollow nodes:

$ ./01-start-mgmt-cluster.sh
$ kind get clusters
mgmt

Wait for the node to become ready and configure the management cluster to use the local registry:

$ kubectl get node
NAME                 STATUS   ROLES                  AGE   VERSION
mgmt-control-plane   Ready    control-plane,master   44s   v1.23.6

$ ./02-apply-localregistryhosting-configmap.sh

Deploy the Cluster API (capi) and Cluster API Kubemark Provider (capk) components and wait for their pods to become ready:

$ ./03-clusterctl-init.sh

$ kubectl get deploy -A | grep cap
capd-system                         capd-controller-manager                         1/1     1            1           10m
capi-kubeadm-bootstrap-system       capi-kubeadm-bootstrap-controller-manager       1/1     1            1           11m
capi-kubeadm-control-plane-system   capi-kubeadm-control-plane-controller-manager   1/1     1            1           11m
capi-system                         capi-controller-manager                         1/1     1            1           11m
capk-system                         capk-controller-manager                         1/1     1            1           10m

Create the a new kind (docker provider) cluster for the control plane under test:

$ kubectl apply -f kubemark/kubemark-workload-control-plane.yaml

Wait for the machine to transition from provisioning to running state:

$ kubectl get machine
NAME                                    CLUSTER             NODENAME                                PROVIDERID                                         PHASE     AGE     VERSION
kubemark-workload-control-plane-lvkcv   kubemark-workload   kubemark-workload-control-plane-lvkcv   docker:////kubemark-workload-control-plane-lvkcv   Running   3m31s   v1.23.6

$ kubectl get clusters
NAME                PHASE         AGE     VERSION
kubemark-workload   Provisioned   4m4s   

$ kind get clusters
kubemark-workload
mgmt

Let’s take a look to the new kubemark-workload kind cluster that will host the control plane under test. As you can see the node is in NotReady state (because there is no CNI deployed) and the CNI dependant pods are in Pending state:

$ ./get-kubeconfig.sh kubemark-workload

$ kubectl get node --kubeconfig=kubeconfig.kubemark-workload
NAME                                    STATUS     ROLES                  AGE   VERSION
kubemark-workload-control-plane-lvkcv   NotReady   control-plane,master   46m   v1.23.6

$ kubectl get po -A --kubeconfig=kubeconfig.kubemark-workload
NAMESPACE     NAME                                                            READY   STATUS    RESTARTS   AGE
kube-system   coredns-79dc848587-8qbgk                                        0/1     Pending   0          6m31s
kube-system   coredns-79dc848587-n9428                                        0/1     Pending   0          6m31s
kube-system   etcd-kubemark-workload-control-plane-lvkcv                      1/1     Running   0          6m39s
kube-system   kube-apiserver-kubemark-workload-control-plane-lvkcv            1/1     Running   0          6m39s
kube-system   kube-controller-manager-kubemark-workload-control-plane-lvkcv   1/1     Running   0          6m39s
kube-system   kube-proxy-skgc9                                                1/1     Running   0          6m31s
kube-system   kube-scheduler-kubemark-workload-control-plane-lvkcv            1/1     Running   0          6m39s

Let’s deploy OVN-Kubernetes on the cluster (more information on how to deploy OVN-K on a preexisting kind cluster in this past blog post. OVN-Kubernetes is a CNI for Kubernetes based on the Open Virtual Network (OVN) project:

$ ./deploy-cni-ovn.sh $(pwd)/kubeconfig.kubemark-workload kubemark-workload

Check if the nodes and the CNI dependant pods have transitioned to Ready state and the OVN pods are present:

$ kubectl get node --kubeconfig=kubeconfig.kubemark-workload
NAME                                    STATUS   ROLES                  AGE   VERSION
kubemark-workload-control-plane-lvkcv   Ready    control-plane,master   78m   v1.23.6

$ kubectl get po -A --kubeconfig=kubeconfig.kubemark-workload
NAMESPACE        NAME                                                            READY   STATUS    RESTARTS   AGE
default          test2                                                           1/1     Running   0          3m4s
kube-system      coredns-79dc848587-8qbgk                                        1/1     Running   0          78m
kube-system      coredns-79dc848587-n9428                                        1/1     Running   0          78m
kube-system      etcd-kubemark-workload-control-plane-lvkcv                      1/1     Running   0          78m
kube-system      kube-apiserver-kubemark-workload-control-plane-lvkcv            1/1     Running   0          78m
kube-system      kube-controller-manager-kubemark-workload-control-plane-lvkcv   1/1     Running   0          78m
kube-system      kube-proxy-skgc9                                                1/1     Running   0          78m
kube-system      kube-scheduler-kubemark-workload-control-plane-lvkcv            1/1     Running   0          78m
ovn-kubernetes   ovnkube-db-7d8fdc7dfb-2pf8m                                     2/2     Running   0          6m42s
ovn-kubernetes   ovnkube-master-6dbd568bb5-89s7c                                 2/2     Running   0          6m41s
ovn-kubernetes   ovnkube-node-7s7r5                                              3/3     Running   0          6m33s
ovn-kubernetes   ovs-node-gnpv9                                                  1/1     Running   0          6m41s

At this point we are ready to deploy Kubemark hollow nodes in the managment cluster. This step will create 4 Kubemark hollow nodes:

kubectl apply -f kubemark/kubemark-workload-md0.yaml

Let’s check things from the managment cluster perspective first:

$ kubectl get machine
NAME                                     CLUSTER             NODENAME                                PROVIDERID                                         PHASE     AGE   VERSION
kubemark-workload-control-plane-lvkcv    kubemark-workload   kubemark-workload-control-plane-lvkcv   docker:////kubemark-workload-control-plane-lvkcv   Running   84m   v1.23.6
kubemark-workload-md-0-764cb59d5-8c62j   kubemark-workload   kubemark-workload-md-0-v7592            kubemark://kubemark-workload-md-0-v7592            Running   57s   v1.23.6
kubemark-workload-md-0-764cb59d5-bb2p4   kubemark-workload   kubemark-workload-md-0-4955k            kubemark://kubemark-workload-md-0-4955k            Running   57s   v1.23.6
kubemark-workload-md-0-764cb59d5-hwlh7   kubemark-workload   kubemark-workload-md-0-m82cf            kubemark://kubemark-workload-md-0-m82cf            Running   57s   v1.23.6
kubemark-workload-md-0-764cb59d5-jrmgt   kubemark-workload   kubemark-workload-md-0-82m9j            kubemark://kubemark-workload-md-0-82m9j            Running   57s   v1.23.6

$ kubectl get po
NAME                           READY   STATUS    RESTARTS   AGE
kubemark-workload-md-0-4955k   1/1     Running   0          90s
kubemark-workload-md-0-82m9j   1/1     Running   0          90s
kubemark-workload-md-0-m82cf   1/1     Running   0          90s
kubemark-workload-md-0-v7592   1/1     Running   0          90s

Finally, let’s check things from the cluster under test perspective:

$ kubectl get node --kubeconfig=kubeconfig.kubemark-workload
NAME                                    STATUS   ROLES                  AGE     VERSION
kubemark-workload-control-plane-lvkcv   Ready    control-plane,master   84m     v1.23.6
kubemark-workload-md-0-4955k            Ready    <none>                 2m11s   v1.23.6
kubemark-workload-md-0-82m9j            Ready    <none>                 2m6s    v1.23.6
kubemark-workload-md-0-m82cf            Ready    <none>                 2m10s   v1.23.6
kubemark-workload-md-0-v7592            Ready    <none>                 2m9s    v1.23.6

Creating resources on the workload cluster

Let’s create a simple pod and service:

$ kubectl run test --image nginx --kubeconfig=kubeconfig.kubemark-workload
pod/test created

$ kubectl get po -o wide --kubeconfig=kubeconfig.kubemark-workload
NAME   READY   STATUS    RESTARTS   AGE    IP                NODE                           NOMINATED NODE   READINESS GATES
test   1/1     Running   0          100s   192.168.192.168   kubemark-workload-md-0-m82cf   <none>           <none>

$ kubectl expose po/test --port 5000 --kubeconfig=kubeconfig.kubemark-workload
service/test exposed

$ kubectl get service --kubeconfig=kubeconfig.kubemark-workload
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   192.168.122.1    <none>        443/TCP    87m
test         ClusterIP   192.168.122.93   <none>        5000/TCP   7s

Let’s check OVN databases:

$ POD=$(kubectl get pod -n ovn-kubernetes -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' --kubeconfig=kubeconfig.kubemark-workload | grep ovnkube-db-) ; kubectl exec -ti $POD -n ovn-kubernetes -c nb-ovsdb --kubeconfig=kubeconfig.kubemark-workload -- bash

[root@kubemark-workload-control-plane-lvkcv ~]# ovn-nbctl ls-list 
712ca431-ff74-4aef-af8d-00acee6e40dd (ext_kubemark-workload-control-plane-lvkcv)
95755675-c275-4d04-bd35-713ba7597c0c (join)
d7264e2c-4e4e-44fe-9eae-5b99facca098 (kubemark-workload-control-plane-lvkcv)
ee3c0a20-7df2-421b-8e9e-b676080d6976 (kubemark-workload-md-0-4955k)
b4f230a6-9151-44cd-8fa9-4f489799274e (kubemark-workload-md-0-82m9j)
a27e961a-6aaf-4e33-999c-9e7fd73611fa (kubemark-workload-md-0-m82cf)
48b096d8-42c5-4d18-b226-924ec60af0c5 (kubemark-workload-md-0-v7592)

[root@kubemark-workload-control-plane-lvkcv ~]# ovn-nbctl lb-list
UUID                                    LB                  PROTO      VIP                    IPs
8ffbeb8b-c2ba-4549-9a5b-5ac9577c4271    Service_default/    tcp        192.168.122.1:443      172.18.0.5:6443
e4b5bceb-3b51-48e9-be67-7b45fb966caf    Service_default/    tcp        192.168.122.93:5000    192.168.192.168:5000
654c5590-a2b7-4a6e-bf04-d8c1c78b0267    Service_default/    tcp        192.168.122.1:443      169.254.169.2:6443
ca23b927-4b87-4fdd-b16c-f8c3d824e6e6    Service_kube-sys    tcp        192.168.122.10:53      10.244.0.3:53,10.244.0.4:53
                                                            tcp        192.168.122.10:9153    10.244.0.3:9153,10.244.0.4:9153
699e0b39-1be8-4db7-953f-dbc836d42faf    Service_kube-sys    udp        192.168.122.10:53      10.244.0.3:53,10.244.0.4:53

[root@kubemark-workload-control-plane-lvkcv ~]# ovn-sbctl list port_binding default_test
_uuid               : 26050c0d-0e5d-4496-b0ee-0b3df1bb40c9
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 1ac0b646-9d4d-432e-9e59-64db6520973f
encap               : []
external_ids        : {namespace=default, pod="true"}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : default_test
mac                 : ["0a:58:0a:f4:02:03 10.244.2.3"]
mirror_rules        : []
nat_addresses       : []
options             : {iface-id-ver="b505da18-8294-41ac-a25e-ffeeb5d3b7fb", requested-chassis=kubemark-workload-md-0-m82cf}
parent_port         : []
port_security       : ["0a:58:0a:f4:02:03 10.244.2.3"]
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 2
type                : ""
up                  : false
virtual_parent      : []

Scaling the cluster

Let’s check how many resources Kubemark hollow nodes consume (<30 MiB, compared to 650 MiB of a normal ovnkube worker):

$ kubectl top pod
NAME                           CPU(cores)   MEMORY(bytes)   
kubemark-workload-md-0-4955k   38m          28Mi            
kubemark-workload-md-0-82m9j   36m          28Mi            
kubemark-workload-md-0-m82cf   45m          29Mi            
kubemark-workload-md-0-v7592   41m          28Mi

In our 4 GiB VM we have 1GiB available:

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       2.5Gi       170Mi        25Mi       1.2Gi       1.0Gi

Lets create a total of 30 Kubemark hollow nodes:

$ kubectl patch --type merge MachineDeployment kubemark-workload-md-0 -p '{"spec":{"replicas":30}}'

$ kubectl get machine | grep kubemark-workload-md-0 | grep Running | wc -l
30

$ kubectl get po | grep kubemark-workload | grep Running | wc -l
30

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.8Gi       3.2Gi       112Mi        28Mi       548Mi       347Mi

Stressing the cluster

Let’s use kube-burner to stress our workload cluster. Kube-burner is a tool aimed at stressing kubernetes clusters, by creating/deleting objects declared in jobs.

Let’s install kube-burner:

$ wget https://github.com/cloud-bulldozer/kube-burner/releases/download/v1.2/kube-burner-1.2-Linux-x86_64.tar.gz

$ sudo install -o root -g root -m 0755 kube-burner /usr/local/bin/kube-burner

$ kube-burner version
Version: 1.2
Git Commit: 563bc92b9262582391e5dffb8941b914ca19d2d3
Build Date: 2023-01-13T10:18:17Z
Go Version: go1.19.4
OS/Arch: linux amd64

Let’s take a look at the configuration file kubeburner/cfg.yaml:

---
global:
  writeToFile: false
  indexerConfig:
    enabled: false

jobs:
  - name: kubelet-density
    preLoadImages: false
    jobIterations: 100
    qps: 20
    burst: 20
    namespacedIterations: false
    namespace: kubelet-density
    waitWhenFinished: true
    podWait: false
    objects:
      - objectTemplate: pod.yaml
        replicas: 1
        inputVars:
          containerImage: gcr.io/google_containers/pause-amd64:3.0

Let’s create some pods on the cluster:

$ KUBECONFIG=kubeconfig.kubemark-workload kube-burner init -c kubeburner/cfg.yaml
INFO[2023-01-17 15:21:25] 🔥 Starting kube-burner (1.2@563bc92b9262582391e5dffb8941b914ca19d2d3) with UUID def1da7b-a5db-4c05-bb17-167d889ef33b 
INFO[2023-01-17 15:21:25] 📈 Creating measurement factory               
INFO[2023-01-17 15:21:25] Job kubelet-density: 100 iterations with 1 Pod replicas 
INFO[2023-01-17 15:21:25] QPS: 20                                      
INFO[2023-01-17 15:21:25] Burst: 20                                    
INFO[2023-01-17 15:21:25] Triggering job: kubelet-density              
INFO[2023-01-17 15:21:26] Running job kubelet-density                  
INFO[2023-01-17 15:21:32] Waiting up to 3h0m0s for actions to be completed 
INFO[2023-01-17 15:21:51] Actions in namespace kubelet-density completed 
INFO[2023-01-17 15:21:51] Finished the create job in 23s               
INFO[2023-01-17 15:21:51] Verifying created objects                    
INFO[2023-01-17 15:21:52] pods found: 100 Expected: 100                  
INFO[2023-01-17 15:21:52] Job kubelet-density took 26.88 seconds       
INFO[2023-01-17 15:21:52] Finished execution with UUID: def1da7b-a5db-4c05-bb17-167d889ef33b 
INFO[2023-01-17 15:21:52] 👋 Exiting kube-burner def1da7b-a5db-4c05-bb17-167d889ef33b

$ kubectl get po -n kubelet-density | grep Running | wc -l
100

Leave a comment