This repository implements an approach to evaluate projects that integrate Kubernetes and Slurm. This project is part of my master’s thesis at the Georg August University of Göttingen. The goal of the thesis is to investigate approaches to run Kubernetes workloads in a Slurm cluster. In this repository, the following projects are subject of our evaluation:
- IBM/Bridge-Operator
- CARV-ICS-FORTH/HPK
- soerenmetje/KSI
- Slurm (without any Kubernetes integration - serves as a reference point / base line)
We are aware of further projects such as Sylabs/WLM-Operator, SchedMD/slurm-k8s-bridge, and kalenpeterson/kube-slurm. However, these projects are either strongly deprecated and did not pass our minimal functional test or aim at a different goal. Therefore, these projects are not included.
In our evaluation we used the following benchmark tools to evaluate certain metrics:
Metric | Benchmark | Version |
---|---|---|
CPU performance | Sysbench CPU | 1.0.20 |
Memory throughput | Stream | 5.10 |
Storage throughput | fio (rnd / seq) | 3.35 |
Network throughput | Iperf3 | 3.9 |
Network latency | Netperf | 2.7.1 |
Workload startup time | Our own approach | not versioned |
Unfortunately, the tool Nuttcp seems to have no package for CentOS Stream 9. Therefore, instead of compiling Nuttcp on our own, we simply use Iperf3 that serves the same functionality.
- Shell scripts in
src/benchmark/
to:- perform benchmarks on each project
- write benchmark results into
.csv
files
- Python scripts in
src/plot/
to:- read the result files
- create plots
- Jupyter notebook analysis.ipynb to:
- read the result files
- print details such as mean, std, and difference to slurm
- CSV result files in
data/
- Log files in
logs/
- Plot images in
plots/
To perform the evaluation, a certain prerequisites have to be ensured:
- Slurm cluster up and running.
- Local machine (e.g. laptop) runs a Linux distribution. We tested this setup using Ubuntu 22.04.
- Local machine (e.g. laptop) can log in on the Slurm master node using SSH and the SSH key
.ssh/id_rsa
. - Local machine has
bash
,ssh
, andpython3
installed as well as the python packages defined in requirements.txt. - All prerequisites of all projects (KSI, HPK, Bridge-Operator) are ensured.
- For benchmarks on Slurm and Bridge-Operator, the benchmark tools has to be installed on the cluster nodes. KSI and HPK use container images and therefore do not rely on installed software.
- For benchmarks on Bridge-Operator, an additional machine is required, that runs a Kubernetes cluster. In the cluster the Bridge-Operator is required to be up and running. We describe the setup details in Setup-Bridge-Operator.md.
- For benchmarks on HPK, the HPK components has to be started and configured manually as described in Setup-HPK.md.
- To run Fio, iPerf3, and Netperf benchmarks, also manual steps are needed as described below.
The fio disk benchmarks heavily depend on the available RAM. If more RAM is available than is used as file size in the benchmark, usually Linux caches these files. As a result, the benchmark measures higher throughputs than are practically possible regarding storage device throughput. For reference: typical SATA 3 SSDs suppy 480 MB/s sequential read throughput.
A solution is use direct I/O by adding the parameter --direct=1
to fio.
Another solution to limit the available RAM during the benchmark by utilizing the tool mem-eater. Essentially, this tool allocates RAM until a desired amount of RAM is left. This limits Linux's capabilities to cache the files during the benchmark. We provide the sourcecode for mem-eater in src/benchmark/common/mem-eater.c. Start mem-eater manually before running the fio disk benchmarks. Regarding the desired RAM, it is a good rule of thumb to choose to benchmark the total filesize that is at least 2 times the available RAM - e.g. 8GiB files for 4GiB RAM.
# Compile
gcc -o mem-eater mem-eater.c
# Run ./mem-eater <desiredRamInMiB>
./mem-eater 4096
The tools iPerf3 and Netperf operate in a client-server model. Therefore, in this setup it is required that the server component is started manually on a second node in the Slurm cluster.
In case of iPerf3 the server can be started by following command:
iperf3 -s -p 5003
For the Netperf server you can run:
netserver -D -p 16604
-D
to do not daemonize and-p
to set port.
This repository contains a script main.sh. This script is designed to be executed locally, e.g., on a laptop. It
- connects to the Slurm cluster (and Kubernetes cluster if needed)
- runs benchmarks
- copies the result file (
.csv
) as well as log files from cluster back to the local machine
Following command is an example for evaluating the ksi
project using the stream-memory
benchmark:
# /bin/bash src/benchmark/main.sh <project> <benchmark>
/bin/bash src/benchmark/main.sh ksi stream-memory
After execution, the result file can be obtained in data/
and the log files in logs/
.
Available parameters - the project and the benchmark - can be determined by the directory and file names.
The directory names in src/benchmark/
are the available projects:
ksi
hpk
bridge-operator
slurm
The available benchmarks can be determined by the file names workload-*.sh
inside the project directories:
sysbench-cpu
stream-memory
fio-diskrnd
fio-diskseq
netperf-latency-tcp
iperf3-bandwidth
startup-time
The benchmarks fio-diskrnd
, fio-diskseq
, netperf-latency-tcp
, and iperf3-bandwidth
, require manual actions on the slurm cluster before they are executed. This is covered in the prerequisites sections.
- For testing we disabled writing caching as described here: https://stackoverflow.com/questions/20215516/disabling-disk-cache-in-linux/20215603#20215603
- Nevertheless, Linux seems to heavily utilizes file caching on read operations. To the best of our knowledge, this can not be disabled. A solution is to use more file IO size for read or write operations, than memory is available
- To benchmark the project bridge-operator, a Kubernetes cluster is needed. Theoretically, a Kind cluster is sufficient. We used a single node Kubernetes cluster deployed in a cloud VM. In order to obtain accurate results in startup-time benchmark, the time on the Slurm node and the VM have to be correct.
To add a new benchmark perform the following actions. Replace <benchmark-name>
with the actual name.
- Add a Bash script file to each project dir in
src/benchmark
. These files run the benchmark. Use the file nameworkload-<benchmark-name>.sh
. - Extend the Bash script src/benchmark/common/parse.sh in the functions
initResultFile
andparseLogFile
to add parsing functionality. - Add a Python script file named
plot-<benchmark-name>.py
to the directorysrc/plot
. - Test the process:
/bin/bash src/benchmark/main.sh slurm <benchmark-name>
.
To add a new project that should be evaluated do the following actions. Replace <project-name>
with the actual project name.
- Add a new directory named
<project-name>
to the directorysrc/benchmark
. - Add multiple Bash script files for all benchmarks into this directory. Use the file names
workload-<benchmark-name>.sh
. For parsing, the benchmark result is expected to be printed to stdout as done in the existing workload bash script files. - Extend the Bash script src/benchmark/main.sh, by adding a new case for the project in the if-elif-else construct marked by
# Start benchmarking
. - Extend all Python script files in the directory
src/plot
to add<project-name>
to the list ofproject_dirs
. - Extend the Python script src/plot/common.py by adding a human-readable project name to the dict
_mapNames
. - Test the process:
/bin/bash src/benchmark/main.sh <project-name> stream-memory
.
In the current state, we completed the following benchmarks on each project:
KSI | HPK | Bridge-Operator | Slurm | |
---|---|---|---|---|
Sysbench CPU | ✅ | ✅ | ✅ | ✅ |
Stream Memory | ✅ | ✅ | ✅ | ✅ |
Fio Disk seq | ✅ | ✅ | ✅ | ✅ |
Fio Disk rnd | ✅ | ✅ | ✅ | ✅ |
💀 time-based => can not read / write desired file size | 💀 time-based => can not read / write desired file size | |||
💀 time-based => can not read / write desired file size | 💀 time-based => can not read / write desired file size | |||
💀 bug: no seq read available | ||||
iPerf3 Network Throughput | ✅ | ✅ | ✅ | ✅ |
Netperf Network Latency (TCP) | ✅ | ✅ | ✅ | ✅ |
Workload start up time | ✅ | ✅ | ✅ | ✅ |
✅ = successfully completed 💀 = error occurred / completion not possible