Push publicly testing and benchmarking methodology for acceleration kernels
vmayoral opened this issue · comments
Task list:
- Describe the benchmarking architecture in a formal manner https://github.com/ros-infrastructure/rep/pull/324/files#diff-f230b6aa06d86bf594d8e431300e453ad7343e8f4b1932252b6d36c62a8b5e0aR203-R267
- Create tracepoint examples for hardware acceleration as a fork of
tracetools
(fork might get integrated intotracetools
in the future if appropriate) https://github.com/ros-acceleration/tracetools_acceleration
Facilitate testing environments that allow to benchmark accelerators with special focus on power consumption and time. - Instrument some of the examples at
acceleration_examples
- Contributed and disclosed
adaptive_component
- Document it step-by-step in a comprehensive manner (pushed to future documentation tickets)
- Provide guidelines and recommendations to further instrument computations engaging with FPGAs and GPUs (stretch goal, might be pushed to future tickets)
Results from instrumenting acceleration_examples
for benchmarking purposes:
Some resuls while analyzing traces of acceleration_examples:
ros2 launch doublevadd_publisher vadd_analyse.launch.py
[INFO] [launch]: All log files can be found below /home/xilinx/.ros/log/2021-10-25-16-41-58-551903-xilinx-276809
[INFO] [launch]: Default logging verbosity is set to INFO
vadd iteration: 0 → 451.126394 ms
vadd iteration: 1 → 451.058473 ms
vadd iteration: 2 → 451.093805 ms
vadd iteration: 3 → 450.908674 ms
vadd iteration: 4 → 451.054086 ms
vadd iteration: 5 → 451.061446 ms
vadd iteration: 6 → 451.002277 ms
vadd iteration: 7 → 450.974767 ms
ros2 launch faster_doublevadd_publisher vadd_analyse.launch.py
[INFO] [launch]: All log files can be found below /home/xilinx/.ros/log/2021-10-25-16-42-18-406076-xilinx-276811
[INFO] [launch]: Default logging verbosity is set to INFO
vadd iteration: 0 → 88.182015 ms
vadd iteration: 1 → 88.063425 ms
vadd iteration: 2 → 88.067774 ms
vadd iteration: 3 → 88.091864 ms
vadd iteration: 4 → 88.052144 ms
vadd iteration: 5 → 88.015094 ms
vadd iteration: 6 → 88.073275 ms
vadd iteration: 7 → 88.081755 ms
...
Disclosed and contributed adaptive_component
, a simple composable container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time.
Examples of using adaptive_component
:
Pushed publicly ros2_kria
which examplifies a collection of ROS 2 packages which facilitate the integration of hardware acceleration capabilities for Xilinx's Kria SOM portfolio while aligned with the HAWG benchmarking approach.
Benchmarking methodology further discussed at #20 (comment)