Roofline is a tool allowing you to get performance insights of a target program in the context of the machine is executing on.
For background on this tool, please read:
- Roofline Modeling and Analysis
- Intel Advisor Roofline
- Roofline: An Insightful Visual Performance Model for Multicore Architectures
- Cache-aware Roofline model: Upgrading the loft
The tool outputs a roofline plot for your application.
Please take into account that this tool is provided as a proof of conccept, without any warranty or support.
The roofline tool is structured similarly to the well-known "perf" tool, with "record" and "report" phases: if you are familiar with perf, you already know how to use this tool!
After having installed the tool (please check out the INSTALL.md file in this folder) it is suggested to set up an alias in your shell, for instance in your .bashrc file:
alias roofline='path/to/roofline/directory/roofline.py'
Roofline uses the open source ERT (Empirical Roofline Tool) project to gain information about the target machine peak floating point and memory bandwidth.
In order to ask ERT to run an the given machine using the specified Floating Point precision:
roofline record_ert --precision [FP64/FP32]
To see instead which roofline-like lines ERT has already gathered:
roofline show_ert
Roofline allows you to define a target Region of Interest (ROI) in your application, in different situations:
Before using the tool, you'll need to define the region of interests for your target application.
In order to achieve this, you can simply include into your application the header file roi_api.h
and call the API:
Roi_Start("label");
...
Code which is interesting for you
...
Roi_End("label");
"label" is the mnemonic for the specified region of interest: choose a meaningul name for your use case. When you specify a label as a start for a region of interest, use the same label for delimitating the end.
Roofline gives you the capability to specify functions already present in the application as Region of Interest delimiters:
roofline record --roi_start <Symbol Name> --roi_end <Symbol Name> ./my_app
Please be careful of asking the tool to perform a meaningful operation: when the starting symbol executed, it does make sense to be sure and ending symbol will eventually be executed as well, in a 1:1 ratio.
Pay attention: if the specified symbols are executed multiple times, the tool will record multiple regions of interest.
The tool also provides the capability of defining a single function as a region of interest:
roofline record --trace_f <Symbol Name> ./my_app
In order to use the tool for recording:
roofline record -o <output_folder_to_store_the_results> -- <target_appliaction> <target application flag>
If you are interested into a more granular recording, the tool supports the '--[read/write]_bytes_only' flag which, if specified, will make the instrumentation client gather only bytes read or written respectively.
The tool will create two different files in the specified output directory reporting all the information gathered:
- roofline.xml - This file contains information about bytes accessed by all the bits of code falling into the specified regions of interest.
- roofline_time.xml - This file contains timinig information about all thei bits of code falling into the specified regions of interest.
Once you've recorded you application, you'll definitely want to actually see the plot being drawn:
rooofline report -i <folder_created_on_roofline_record> --line <hostame>_<PRECISION>
-i
corresponds to the recorded application performance, while --line
is instead the ERT recorded roofline-like line representing the machine capabilities.
Multiple -i
or -line
targets can be specified in such a way that you'll be able to compare different regions of interests under potentially different configurations, when potentially run on top of different machines.
This command will use the previously generated roofline.xml and roofline_time.xml files to draw a plot for you. If you are using a remote machine/server and don't have any graphics packages there, the tool provides you a quick report on the command line you'll be able to see just after having executed the command.
This command also provides another output called roofline.gnu which you can use to get a better plot:
gnuplot roofline.gnu
This will produce a roofline.ps file in the current directory, which is a PostScript file for the final plot.
A suggested way to be able to see the actual plot is to use Okular:
okular roofline.ps
- When using the tool (especially on x86_64), please check out your target application and make sure the floating point assembly instructions are counted correctly in the client file
client/count_fp.hpp
- If you specify functions already present in the code base as start and stop, please make sure that they have not been inlined by the compiler, otherwise they won't be 'officially' executed and Dynamorio (and gdb as well) won't be able to detect the function execution
-
Unfortunately the tool, when instrumenting the target application for gathering the FLOP and Bytes piece of information, significantly slows down the application running, resulting in an increased execution time. This problem can actually be solved by improving the insturmentation client by finding a way of avoiding 'clean calls' and cleaning the code cache as soon as a region of interest starting demarker is met at runtime. If you have time/resources for improving the tool and want to know more about this, please let us know by raising an issue on the project.
-
The assumption, for this beta version, is the target application to be single-threaded. The Dynamorio client already features to trace multi-threaded applications, but this has to be finalized and tested properly.
-
The tool has been designed to support Arm and x86_64. While it's able to precisely count the number of floating point operationsn which will actually be executed in the CPU, precise counting has been implemented only of normal floating point operations and NEON vector instructions. For other Arm FP instructions extensions and x86_64 please check out
client/count_fp.hpp
and make sure the tool is counting correctly. Also in the same file some improvement needs to be done in order to be able to better spot SIMD instructions. -
The tool actually trusts ERT to gain the correct piece of information for the roofline chart. However, ERT benchmarking code, in order to search for the maximum flops value, issues multiple
fadd
scalar operations sequentially into the pipeline. This can be improved and, for future development of the tool, it may be worth taking into account different data sources or improving ERT itself.
If the current documentation is omitting something important or you are find issues running the tool or understanding some parts in the code, please feel free to raise an issue on the github project.