JawandS / Task-Time-Research

Analyzing CPU consumption by using context switch traces with eBPF/bpftrace

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analyzing CPU Consumption

TensorFlow Analysis using bpftrace

Execution

Execute run.sh for data collection on mdoel.py:
bpftrace -e 'tracepoint:sched:sched_switch { printf("%s %lu %d %lu\n", comm, pid, cpu, nsecs); }'

  • Using bpftrace to trace context switches
  • model.py is a TensorFlow deep learning job that automatically kills tracing
  • The timeline is analyzed with processing.py
  • fib.py is a fibonacci job (does not kill tracing)
  • Pass run number to run.sh for

Index of Logs

Logs

  • 1: ~4 second run with ML
  • 2: 1 second run
  • 3: 2 second run
  • 4: 1 second run after restart
  • 5: 1 second run
  • 6: ~4.5 second run with ML
  • 7: ~6 second with ML after restart (didn’t kill tracing for ~1.5 seconds)
  • 8: ~4.5 second run with ML
  • 9: ~6.5 second run with multiple ML
  • 10: ~6.5 second run with multiple ML
  • 11: ~2 second run with fib.py
  • 12: ~1.5 second run with fib.py
  • 13: Running 7 fib jobs with manual kill
  • 14: Running 8 fib jobs with manual kill

Automatically killing the tracing at the end of the ML job

  • 15-17: standard ML runs (30 epochs)
  • 18: 20 CPU server, standard ML run
  • 19: 20 CPU server, 21 fib jobs - interesting results

Notes for project version in Archive

General notes:

  • The first 2k elements of the raw timeline are saved
  • Processes like kworker/# are combined to kworker
  • CPU usage doesn't include

Visuals/Data Generated:

  • Breakdown (chart of the time spent during the job)
  • CPU_usage (time spent on each CPU)
  • Task_counts (unique PID per task)
  • Task_times (time spent on each task, combines all PID)
  • TT_no_idle (time spent on tasks, excluding idle
  • Text file (name is the average lifespan for python processes)

Files Descriptions:

  • model.py (sequential neural network with timestamps)
  • context_switch_timeline.py (tracer that generates a timeline of context switches)
  • data_process.py (processes the timeline from the tracer to get information)
  • visualizer.py (generates graphs and copies data to Run_Data)
  • Run_Data/analyzer.py (performs further analysis on collected data)
  • Archives (contains old files and data)
  • Data (contains data necessary during the run)
  • Run_Data (stores visuals and data)

Server:

Setup notes:

  • Package names vary based on Ubuntu version
  • BCC or eBPF may require kernel flags to be changed
  • The version for linux-headers can be found with (uname -r)
  • Need your username and access token to clone the repo

References:

About

Analyzing CPU consumption by using context switch traces with eBPF/bpftrace

License:MIT License


Languages

Language:Python 92.4%Language:C 2.3%Language:Go 1.5%Language:Shell 1.5%Language:Dockerfile 0.9%Language:Makefile 0.7%Language:Java 0.7%