cea-hpc / HARP

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HARP - Hardware-Accelerated Rust Profiling

About

HARP is a simple profiler for evaluating the performance of hardware-accelerated Rust code. It aims at gauging the capabilities of Rust as a first-class language for GPGPU computing, especially in the field of High Performance Computing (HPC).

Currently, HARP can profile the following GPU-accelerated kernels (targeting OpenCL C and NVIDIA CUDA C++ implementations):

  • AXPY (general vector-vector addition)
  • GEMM (general dense matrix-matrix multiplication)
  • Reduce (32-bit integer sum reduction)
  • Scan (32-bit integer sum exclusive scan)

Profiling can be done on both single-precision and double-precision floating-point formats (see IEEE 754). The reduce and scan kernels are only supported using 32-bit signed integers for the moment.

Quickstart

Pre-requisites

Before starting, make sure the following software is installed on your machine:

  • Rust 1.68.0+
  • OpenCL 2.0+
  • NVIDIA CUDA Toolkit 11.2+ (12.0 recommended) and the appropriate drivers
    • ensure the libnvvm library is installed and that its path is in the LD_LIBRARY_PATH environment variable
    • libnvvm specifically requires LLVM 7.x (7.0 to 7.4), which you can get here
  • Python 3.7+ (only needed for plot generation)
    • depends on the pandas, plotly and kaleido Python packages

Build

First, clone this repository locally:

git clone https://github.com/cea-hpc/HARP
cd HARP

As any Rust-based project, HARP is built with cargo:

cargo build --release

Run

See HARP's documentation for the full list of supported flags, or use the help subcommand.

Example: to execute HARP and profile a DGEMM on multiple matrix sizes, execute the following example command:

cargo run --release -- dgemm --sizes 32 64 128 256 512 1024 2048 4096
# Or with shortand aliases
cargo r -r -- dgemm -s 32 64 128 256 512 1024 2048 4096

Documentation

The crate's documentation is available using cargo:

cargo doc --open

Contributing

Contributions are welcome and accepted as pull requests on GitHub.

You may also ask questions or file bug reports on the issue tracker.

License

Licensed under either of:

The SPDX license identifier for this project is MIT OR Apache-2.0.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Small tool for profiling the performance of hardware-accelerated Rust code using OpenCL and CUDA

License:Apache License 2.0


Languages

Language:Rust 77.1%Language:Cuda 12.3%Language:Python 6.2%Language:C 4.4%