Push publicly testing and benchmarking methodology for acceleration kernels

Question

Push publicly testing and benchmarking methodology for acceleration kernels

vmayoral opened this issue 3 years ago · comments

Víctor Mayoral Vilches commented 3 years ago

Task list:

Describe the benchmarking architecture in a formal manner https://github.com/ros-infrastructure/rep/pull/324/files#diff-f230b6aa06d86bf594d8e431300e453ad7343e8f4b1932252b6d36c62a8b5e0aR203-R267
Create tracepoint examples for hardware acceleration as a fork of tracetools (fork might get integrated into tracetools in the future if appropriate) https://github.com/ros-acceleration/tracetools_acceleration
Facilitate testing environments that allow to benchmark accelerators with special focus on power consumption and time.
Instrument some of the examples at acceleration_examples
- doublevadd_publisher
- faster_doublevadd_publisher
Contributed and disclosed adaptive_component
Document it step-by-step in a comprehensive manner (pushed to future documentation tickets)
Provide guidelines and recommendations to further instrument computations engaging with FPGAs and GPUs (stretch goal, might be pushed to future tickets)

Víctor Mayoral Vilches · Answer 1 · Wed Oct 27 2021 23:46:35 GMT+0800 (China Standard Time)

Results from instrumenting acceleration_examples for benchmarking purposes:

Some resuls while analyzing traces of acceleration_examples:

ros2 launch doublevadd_publisher vadd_analyse.launch.py
[INFO] [launch]: All log files can be found below /home/xilinx/.ros/log/2021-10-25-16-41-58-551903-xilinx-276809
[INFO] [launch]: Default logging verbosity is set to INFO
vadd iteration: 0 → 451.126394 ms
vadd iteration: 1 → 451.058473 ms
vadd iteration: 2 → 451.093805 ms
vadd iteration: 3 → 450.908674 ms
vadd iteration: 4 → 451.054086 ms
vadd iteration: 5 → 451.061446 ms
vadd iteration: 6 → 451.002277 ms
vadd iteration: 7 → 450.974767 ms

ros2 launch faster_doublevadd_publisher vadd_analyse.launch.py
[INFO] [launch]: All log files can be found below /home/xilinx/.ros/log/2021-10-25-16-42-18-406076-xilinx-276811
[INFO] [launch]: Default logging verbosity is set to INFO
vadd iteration: 0 → 88.182015 ms
vadd iteration: 1 → 88.063425 ms
vadd iteration: 2 → 88.067774 ms
vadd iteration: 3 → 88.091864 ms
vadd iteration: 4 → 88.052144 ms
vadd iteration: 5 → 88.015094 ms
vadd iteration: 6 → 88.073275 ms
vadd iteration: 7 → 88.081755 ms
...

Víctor Mayoral Vilches · Answer 2 · Mon Nov 22 2021 13:40:08 GMT+0800 (China Standard Time)

Disclosed and contributed adaptive_component, a simple composable container for Adaptive ROS 2 Node computations. Select between FPGA, CPU or GPU at run-time.

Examples of using adaptive_component:

Víctor Mayoral Vilches · Answer 3 · Tue Jan 25 2022 16:19:01 GMT+0800 (China Standard Time)

Pushed publicly ros2_kria which examplifies a collection of ROS 2 packages which facilitate the integration of hardware acceleration capabilities for Xilinx's Kria SOM portfolio while aligned with the HAWG benchmarking approach.

Víctor Mayoral Vilches · Answer 4 · Tue Feb 22 2022 17:04:25 GMT+0800 (China Standard Time)

Benchmarking methodology further discussed at #20 (comment)