JawandS / Task-Time-Research

Analyzing CPU consumption by using context switch traces with eBPF/bpftrace

ebpf tracing bpftrace machine-learning

Analyzing CPU Consumption

TensorFlow Analysis using bpftrace

Execution

Execute run.sh for data collection on mdoel.py:
bpftrace -e 'tracepoint:sched:sched_switch { printf("%s %lu %d %lu\n", comm, pid, cpu, nsecs); }'

Using bpftrace to trace context switches
model.py is a TensorFlow deep learning job that automatically kills tracing
The timeline is analyzed with processing.py
fib.py is a fibonacci job (does not kill tracing)
Pass run number to run.sh for

Index of Logs

Logs

1: ~4 second run with ML
2: 1 second run
3: 2 second run
4: 1 second run after restart
5: 1 second run
6: ~4.5 second run with ML
7: ~6 second with ML after restart (didn’t kill tracing for ~1.5 seconds)
8: ~4.5 second run with ML
9: ~6.5 second run with multiple ML
10: ~6.5 second run with multiple ML
11: ~2 second run with fib.py
12: ~1.5 second run with fib.py
13: Running 7 fib jobs with manual kill
14: Running 8 fib jobs with manual kill

Automatically killing the tracing at the end of the ML job

15-17: standard ML runs (30 epochs)
18: 20 CPU server, standard ML run
19: 20 CPU server, 21 fib jobs - interesting results

Notes for project version in Archive

General notes:

The first 2k elements of the raw timeline are saved
Processes like kworker/# are combined to kworker
CPU usage doesn't include

Visuals/Data Generated:

Breakdown (chart of the time spent during the job)
CPU_usage (time spent on each CPU)
Task_counts (unique PID per task)
Task_times (time spent on each task, combines all PID)
TT_no_idle (time spent on tasks, excluding idle
Text file (name is the average lifespan for python processes)

Files Descriptions:

model.py (sequential neural network with timestamps)
context_switch_timeline.py (tracer that generates a timeline of context switches)
data_process.py (processes the timeline from the tracer to get information)
visualizer.py (generates graphs and copies data to Run_Data)
Run_Data/analyzer.py (performs further analysis on collected data)
Archives (contains old files and data)
Data (contains data necessary during the run)
Run_Data (stores visuals and data)

Server:

Ubuntu 20.04 on https://www.cloudlab.us/

Setup notes:

Package names vary based on Ubuntu version
BCC or eBPF may require kernel flags to be changed
The version for linux-headers can be found with (uname -r)
Need your username and access token to clone the repo

References:

About

Analyzing CPU consumption by using context switch traces with eBPF/bpftrace

ebpf tracing bpftrace machine-learning

MIT License

Languages

Language:Python 92.4%Language:C 2.3%Language:Go 1.5%Language:Shell 1.5%Language:Dockerfile 0.9%Language:Makefile 0.7%Language:Java 0.7%