VictorEijkhout / diff2d_benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diff2d benchmark

This is a benchmark for parallel computation of a "power method": stencil-based matrix-vector product, norm, scaling, and repeat that.

So far there are C++ implementations based on OpenMP, Kokkos, Sycl, MPI. Contributed implementations welcome.

Compilation

Prerequisites

This software uses the package cxxopts and mdspan.

You can take the easy way out: the CMake installation will fetch these packages. However, if you want to install them yourself:

  • add the .pc files from cxxopts to the PKG_CONFIG_PATH
  • add the mdspan installation directory to the CMAKE_PREFIX_PATH.

CMake compilation

Go into code/diff2d. Calling make without arguments tells you all available rules, and the available variants.

Drive the cmake installation with the makefile:

make cmake VARIANTS="kokkos sycl" ## or other variants

You can of course run cmake outside of make:

variant=span
mkdir build
ln cmake/CMakeLists.txt $variant
cmake -B build -S $variant -D VARIANT=$variant

You can let CMake fetch prerequisite packages or install them yourself; see above.

Makefile compilation

Go into code/diff2d. Calling make without arguments tells you all the make rules. For compilation use make or cmake:

make compile VARIANTS="seq oned"

Set the variable TACC_MDSPAN_INC to the location of the header files.

Code variants

The following code variants are available:

  • oned : traditional C-style OpenMP implementation.
  • clps : OpenMP with collapse(2) directive.
  • span : iterating over a ranges::view::cartesian_product.
  • iota : double loop over iota_views.
  • range : Using range execution policies. Not Working Yet!
  • kokkos : based on Kokkos.
  • sycl : using Sycl; only tested with Intel's Sycl, not with AdaptiveCPP or other implementation.
  • dist : MPI implementation through MPL.

Not yet compilable with CMake: kokkos, sycl, dist.

Running

single program run

Run bin/oned or any other variant. Commandline options:

  • -m 100 -n 200 : domain size;
  • -t : trace output;
  • -i 5 : run only five iterations, not until full convergence which can take forever.

scaling study

Run ./compare_models.py with arguments as below. This is somewhat TACC-dependent; for instance it interrogates SLRUM_CPUS_ON_NODE to find the core count. If you're not on a SLURM cluster, set this by hand.

Other options:

  • -c cpuname : only for outputfile naming;
  • -n 123456 : the -m -n options above;
  • -t : iteration trace output;
  • variant1,variant2,variant3 or all to run specific codes.

About

License:MIT License


Languages

Language:TeX 34.5%Language:C++ 31.8%Language:Makefile 24.4%Language:Python 4.6%Language:Shell 3.4%Language:CMake 0.8%Language:Fortran 0.5%