Release Notes | Roadmap | Apps | 中文
mperf is a modular micro-benchmark/toolkit for kernel performance analysis.
- Investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.
- Draw graph of hierarchical roofline model, used to evaluate performance.
- Collect CPU/GPU PMU events data.
- Analyze CPU/GPU PMU events data(TMA Methodology and customized metrics), used to identify performance bottlenecks.
- OpenCL Linter, used to guide manual OpenCL kernel optimization[TBD].
- C++ Project
- support platform: ARM CPUs, Mali GPUs, Adreno 6xx GPUs
- Lightweight and embeddable library
- The iOS platform is not yet fully functional.
mperf support CMake build system and require CMake version upper than 3.15.2, you can compile the mperf follow the step:
- clone or download the project
git clone https://github.com/MegEngine/mperf.git git submodule update --init --recursive
- choose a test platform
- if you will test arm processor in android OS
- a ndk is required
- download the NDK and extract to the host machine
- set the
NDK_ROOT
env to the path of extracted NDK directory
- a ndk is required
- if you will test x86 processor in linux OS
- a gcc or clang compiler should find by cmake through
PATH
env
- a gcc or clang compiler should find by cmake through
- if you will test arm processor in android OS
- if your target test OS is android,run the
android_build.sh
to build it- print the usage about android_build.sh
./android_build.sh -h
- build for armv7 cpu
./android_build.sh -m armeabi-v7a
- build for arm64 cpu
./android_build.sh [-m arm64-v8a] // default march is arm64-v8a
- build with mali mobile gpu
./android_build.sh -g mali [arm64-v8a, armeabi-v7a]
- build with adreno mobile gpu
./android_build.sh -g adreno [arm64-v8a, armeabi-v7a]
- build with pfm
./android_build.sh -p [arm64-v8a, armeabi-v7a]
- build in debug mode
./android_build.sh -d [arm64-v8a, armeabi-v7a]
- build with your custom install directory
./android_build.sh -i /your/custom/cmake/install/prefix [arm64-v8a, armeabi-v7a] e.g.: ./android_build.sh -i ~/mperf_install [-m arm64-v8a] // default march is arm64-v8a
- print the usage about android_build.sh
- if you target test OS is linux,if you want to enable pfm add
-DMPERF_ENABLE_PFM=ON
to cmake commandcmake -S . -B "build-x86" -DMPERF_ENABLE_PFM=ON cmake --build "build-x86" --config Release
- after build, some executable files are stored in mperf build_dir/apps directory. And you can install the mperf to your system path or your custom install directory by
cmake --build <mperf_build_dir> --target install e.g.: cmake --build ./build-arm64-v8a/ --target install
- and now, you can use
find_package
command to import the installed mperf, and use likeset(mperf_DIR /path/to/your/installed/mperfConfig.cmake) # Note, it's the dirname of mperfConfig.cmake, e.g. set(mperf_DIR ~/mperf_install/lib/cmake/mperf/) find_package(mperf REQUIRED) target_link_libraries(your_target mperf::mperf)
- alternatively,
add_subdirectory(mperf)
will incorporate the library directly in to your's CMake project.
- basic usage for mperf xpmu module:
please see cpu_pmu / mali_pmu / adreno_pmu for more details.
mperf::CpuCounterSet cpuset = "CYCLES,INSTRUCTIONS,..."; mperf::XPMU xpmu(cpuset); xpmu.run(); ... // add your function to be measured xpmu.sample(); xpmu.stop();
- basic usage for mperf tma module:
please see arm_cpu_tma for more details.
mperf::tma::MPFTMA mpf_tma(mperf::MPFXPUType::A55); mpf_tma.init( {"Frontend_Bound", "Bad_Speculation", "Backend_Bound", "Retiring", ...}); size_t gn = mpf_tma.group_num(); for (size_t i = 0; i < gn; ++i) { mpf_tma.start(i); for (size_t j = 0; j < iter_num; ++j) { ... // add your function to be measured } mpf_tma.sample_and_stop(iter_num); } mpf_tma.deinit();
apps
Various user examples, please see apps doc for more details.eca
A module for collecting and analyzing PMU events data(Including TMA analysis).uarch
A set of low-level micro-benchmarks to investigate the basic micro-architectural parameters(uarch) of the target CPU/GPU.doc
Some documents about roofline and tma usage, please see index for the list.cmake
Some cmake relative files.common
Some common helper functions.third_party
Some dependent libraries.linter
OpenCL Linter [TBD].
- A tutorial about how to optimize matmul to achieve peak performance on ARM A55 core, which will illustrate the basic logic of how to use mperf help your optimization job, please reference optimize the matmul with the help of mperf.
mperf is licensed under the Apache-2.0 license.