rajesh-s / tp-parsec-openmp-tasks

Profiling open mp tasks on tp-parsec benchmark suite

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Profiling OpenMP tasks on TP-PARSEC benchmark suite

1. Progress so far

2. Background concepts

  • OpenMP is an API for writing Multithreaded Applications

    • A set of compiler directives (for C/C++) and library routines for parallel application programmers
    • omp_set_num_threads(4) - Runtime function to set number of threads
    • omp_get_thread_num() gets thread ID's (something like CUDA)
    • #pragma omp parallel is used to mark a parallel region of code
      • Each thread executes a copy of the code within the structured block on the same data (SPMD)
    • Synchronization to prevent race condition when sharing variables (for thread communication)
    • pfor
    • Resource
  • Task parallel programming model, task is a logical unit of concurrency which can be arbitrarily created at any point and dynamically scheduled on available cores. PFor can be expressed as tasks because tasks can be created anywhere. Dynamic/automatic load balancing simplifies scheduling.

  • tp-parsec replaces "parallel for" with "pfor" (task version)

    • pfor divides the for loop iterations recursively into 2 halves and create two tasks executing them at each recursive level. create_task and wait_tasks. The grain size value indicates at what point the recursive division should stop and the leaf computation executed
    • Blackscholes
      • Nested loop of 100 x 10^7
      • Fine grained tasks with grain size 10k
    • Freqmine
      • 7 parallel for loops in PARSEC
        • Step 1: 1 for loop
        • Step 2: 4 for loops
        • Step 3: 2 for loops
      • In task version, pfor primitive is added in place of parallel for. Grain size is set to 1 for all 7 pfor.

Execution based on scheduling Tasks

Execution based on fixed threads (fork/join) Threads

3. Profiling OpenMP tasks on tp-parsec benchmark suite

  • All possible input options for a specific workload can be retrieved using: tp-parsec/pkgs/apps/blackscholes/inputs

    • Inputs available: native, simlarge, simsmall, simmedium, simdev. Native is to be used on real machines
  • Specific interest is on task_omp: Task based version of OpenMP

  • Build/run on the blackscholes workload /home/rajesh/tp-parsec/tp-parsec/bin/parsecmgmt2 -a build run -p blackscholes -c gcc-task_omp -i simlarge -n 8

  • Build blackscholes workload with DAG recorder /home/rajesh/tp-parsec/tp-parsec/bin/parsecmgmt2 -a build run -p blackscholes -c gcc-task_omp-dr -i simlarge -n 8 # With DAG recorder

    • Install dagviz using this that can be used to open the *.dag in tp-parsec output directory (ex: pkgs/apps/freqmine/run/amd64-linux.gcc-task_omp-dr)
  • For ICC compiler, be sure to update tp-parsec/config/icc.bldconf with the install paths

  • Possible profiler options:

    • Intel Vtune
    • HPCToolkit
    • Score-P
    • Omp-Whip

3.1. DAGViz

Example with

/home/rajesh/tp-parsec/tp-parsec/bin/parsecmgmt2 -a build run -p freqmine -c gcc-task_omp-dr -i simlarge -n 8

Note: native inputs can take extremely long and crash with DAGrecorder enabled on DAGViz. Try using the sim inputs

1 2

Parallelism profile: GUI is better on PyQT. Can use vtune for this!

3

4. Results

  • The benchmarks were run on the following configuration:

    Architecture:                    x86_64
    Byte Order:                      Little Endian
    Address sizes:                   39 bits physical, 48 bits virtual
    Thread(s) per core:              2
    Core(s) per socket:              4
    Socket(s):                       1
    Model name:                      Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
    CPU MHz:                         3192.259
    CPU max MHz:                     3400.0000
    CPU min MHz:                     800.0000
  • tp-parsec logs are here

  • Metrics capture in the xls on the repository

4.1. VTune

Vtune

Spawned threads 1

5. Appendix

5.1. Installing vTune

sudo apt-get install libgtk-3-0 libasound2 libxss1 libnss3
https://software.intel.com/content/www/us/en/develop/documentation/vtune-install-guide-linux/top/user-interface-install.html#user-interface-install
source <install-dir>/env/vars.sh
https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/set-up-analysis-target/linux-targets/building-and-installing-the-sampling-drivers-for-linux-targets.html
https://software.intel.com/content/www/us/en/develop/documentation/vtune-help/top/installation/verify-linux-installation.html
echo "0"|sudo tee /proc/sys/kernel/yama/ptrace_scope

5.2. HPC Toolkit

git clone https://github.com/spack/spack.git
cd spack/bin/
./spack install hpctoolkit
./spack install hpcviewer ^jdk@1.8.0_141-b15
  • Flow
# HPCRun
~/tools/hpctoolkit/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.3.0/hpctoolkit-2020.08.03-gxrlef3hst5bjnlmmnny4ypagzdv55cd/bin/hpcrun /home/rajesh/tp-parsec/tp-parsec/bin/parsecmgmt2 -a build run -p bodytrack -c gcc-task_omp -i native -n 2

# HPCStruct
~/tools/hpctoolkit/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.3.0/hpctoolkit-2020.08.03-gxrlef3hst5bjnlmmnny4ypagzdv55cd/bin/hpcstruct /home/rajesh/tp-parsec/tp-parsec/pkgs/apps/bodytrack/inst/amd64-linux.gcc-task_omp/bin/bodytrack

# HPCProf
~/tools/hpctoolkit/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.3.0/hpctoolkit-2020.08.03-gxrlef3hst5bjnlmmnny4ypagzdv55cd/bin/hpcprof --structure bodytrack.hpcstruct -I /home/rajesh/tp-parsec/tp-parsec/pkgs/apps/bodytrack/inst/amd64-linux.gcc-task_omp/bin/bodytrack/+ hpctoolkit-basename-measurements hpctoolkit-bash-measurements hpctoolkit-cat-measurements hpctoolkit-date-measurements hpctoolkit-dirname-measurements hpctoolkit-expr-measurements hpctoolkit-mkdir-measurements hpctoolkit-rm-measurements hpctoolkit-tee-measurements

# HPCViewer
~/tools/hpctoolkit/spack/opt/spack/linux-ubuntu20.04-haswell/gcc-9.3.0/hpcviewer-2020.07-5jhzze7aivxmalacdujvlgpqz662r5yz/bin/hpcviewer hpctoolkit-basename-database

5.3. Installing Score-P on Ubuntu 20.04.1 LTS

sudo apt install python2
mkdir _build
cd _build/
../configure --prefix=/home/rajesh/tools/scorep-6.0 --with-mpi=openmpi
make
make install

5.4. Omp-Whip

  • This did not work due to memory issues during build. Not sure if it's an issue with the my OS

About

Profiling open mp tasks on tp-parsec benchmark suite