NOELLE Gym

This repository is designed to test NOELLE and its transformations to well established benchmark suites. The open-sourced version of this repository is here.

This repository also includes the evaluation materials for the NOELLE CGO 2022 paper: "NOELLE Offers Empowering LLVM Extensions" that can be found here.

Artifact

This artifact generates three sets of results. Adding SPEC CPU2017 to each set of results is optional (see the section "Experiments and results" for more details).

MINIMAL: Data that supports the version of the paper that was submitted in September excluding the few benchmarks from SPEC CPU2017 that requires several days each (6 days when SPEC CPU2017 is included while excluding the five benchmarks mentioned, 2 days otherwise).
FINAL: New results that were not included in the submitted version of the paper (extra 5 days).

Next you can find the instructions to reproduce all the above results.

Prerequisites

The artifact is available as a docker image. The artifact will generate the results when invoking the script ./bin/compileAndRun. The results set that is generated depends on the envionment variables set (see below).

We open sourced NOELLE, VIRGIL, and the SCAF alias analysis framework in 2020. We also open sourced the infrastructure we built to evaluate NOELLE on several benchmark suites (e.g., PARSEC, MiBench, SPEC CPU2017, PolyBench). Hence, this artifact will download everything is needed by cloning the open-sourced repositories. This means the script bin/compileAndRun will clone the open sourced git repositories (from GitHub) that are not included within the docker image. So please make sure to have a network connection when you run the artifact.

Experiments and results

Next we describe the three set of experiments and results that can be generated with this artifact. Timing-based results might differ slighlty from the plots shown in the submitted paper (the claims made in the paper are still valid).

SPEC CPU2017

Because SPEC CPU2017 cannot be shared, this artifact enables/disables this suite by having, or not having, the environment variable NOELLE_SPEC. This environment variable is not set by default and therefore SPEC benchmarks will not run by default. To include the SPEC CPU2017 benchmarks, you need to:

copy the SPEC CPU2017 archived and compressed using gzip into the file benchmarkSuites/SPEC2017.tar.gz
set the environment variable NOELLE_SPEC to an integer value (e.g., export NOELLE_SPEC=1)
Run the other steps as described next (e.g., MINIMAL)

Results

Next is the mapping from the results generated by this artifact and the Figures included in the paper:

Figure 3: results/current_machine/dependences/*/relative_values.txt
Figure 4: results/current_machine/loops/*/invariants.txt
Figure 5: results/current_machine/time/*/DOALL.txt for DOALL speedups, results/current_machine/time/*/HELIX.txt for HELIX speedups, and results/current_machine/time/*/DSWP.txt for DSWP speedups.

Time results are generated by running each benchmark 5 times by default. The median is computed from these runs. You can customize how many runs you want to generate by setting the environment variable NOELLE_RUNS. For example, if you want to run each time-experiment 11 times, then run

export NOELLE_RUNS=11;

and then generate one of the three set of results (see below). If you do not set NOELLE_RUNS, then each time-sensitive result is generated 5 times.

Execution times might vary depending on the platform. We tuned the parallelization techniques for our platform, the one described in the NOELLE paper. We noticed exeution times vary significantly for HELIX depending on the core-to-core latencies. Also, execution times vary significantly for DSWP depending on the core-to-core bandwidth.

Results need to be generated in an equivalent platform as the one described in the NOELLE paper. Turbo boost and hypter-threading needs to be disabled (they only impact the execution times). Furthermore, because of HELIX and DSWP are sensitive to either latency or bandwidth between cores, it is important to keep all threads running on the same NUMA zone. Also, all the experiments need to be contained within the same CPU socket. Finally, the Intel-based multicore needs to have at least 12 physical cores in the same CPU where the experiments run.

MINIMAL

This set of experiments and results are about all benchmarks included in the submitted version of the paper with the only exception of five SPEC CPU2017 benchmarks (omnetpp_r, perlbench_r, x264_r, blender_r, parest_r). This is because these five benchmarks require a significant amount of time so we decided to keep them separate from the minimal set; these benchmarks are included in the FINAL set. Also, DSWP and HELIX are disabled as they take a significant amount of time.

To generate the MINIMAL results, do the following:

cd ~ ;
unset NOELLE_FINAL ;
./bin/compileAndRun

Please look at the output of the script to know how to check the current state. Finally, results will be stored in results/current_machine.

FINAL

To generate the FINAL results, first generate MINIMAL, and then do the following:

cd ~ ; 
export NOELLE_FINAL=1 ;
./bin/compileAndRun ;

Data organization

All the generated data can be found under results. Data we generated in our machine can be found under results/authors_machine. Data that is generated by running the artifact can be found under results/current_machine.

Both results/authors_machine and results/current_machine have the same structure. They both have one sub-directory per benchmark suite; for example, results/current_machine/PARSEC3 includes all data generated for the PARSEC-3.0 benchmarks. Each benchmark suite has three sub-directories:

dependences: this sub-directory includes information about the dependences of the benchmark. This data is used to generate Figure 3.
loops: this sub-directory includes information about loops like their induction variables or their loop invariants. This data is used to generate Figure 4. This data also includes new results that we will add to the final version of the paper. In more detail, we will add a new figure in the final version of the paper to compare the number of induction variables (per benchmark) detected by LLVM and those detected by NOELLE.
time: this sub-directory includes execution times collected by running the benchmarks when compiled using the unmodified middle-end of clang and when compiled using NOELLE transformations. This data is used to generate Figure 5.
IR: this sub-directory includes all the IR files generated by the different NOELLE configurations (e.g., DOALL, HELIX, DSWP). This is only useful to cache the results of compilations enabling the user of this artifact to avoid re-compiling benchmarks.

Speedups

Each benchmark is run using the vanilla clang compilation pipeline (called baseline) as well as using DOALL, HELIX, and DSWP included in NOELLE.

Baseline results can be found in results/current_machine/time/BENCHMARK_SUITE/baseline. For example, the execution times of blackscholes from PARSEC can be found in results/current_machine/time/PARSEC3/baseline/blackscholes.txt.

DOALL results can be found in results/current_machine/time/BENCHMARK_SUITE/DOALL.

HELIX results can be found in results/current_machine/time/BENCHMARK_SUITE/HELIX.

Finally, DSWP results can be found in results/current_machine/time/BENCHMARK_SUITE/DSWP.

The speedups of DOALL over the baseline can be found in results/current_machine/time/BENCHMARK_SUITE/DOALL.txt. The speedups of HELIX over the baseline can be found in results/current_machine/time/BENCHMARK_SUITE/HELIX.txt. The speedups of DSWP over the baseline can be found in results/current_machine/time/BENCHMARK_SUITE/DSWP.txt. These speedups (DOALL, HELIX, DSWP) are used to generate Figure 5 of the paper.

Invariants

The invariants of all loops of all benchmarks of a benchmark suite can be found in results/current_machine/loops/BENCHMARK_SUITE/invariants.txt. For example, the invariants of PARSEC benchmarks can be found in the file results/current_machine/loops/PARSEC3/invariants.txt.

Each benchmark has one line in the related invariants.txt. This file is organized in three columns. The first column is the name of the benchmark. The second column is the number of invariants accumulated over all loops of a given benchmark that are detected by LLVM. The third column is the number of invariants accumulated over all loops of a given benchmark that are detected by NOELLE. The IR that is analyzed to generate this information is the result of all NOELLE transformations that run before a parallelization scheme. This is the IR file baseline_with_metadata.bc of a given benchmark, which can be found in results/current_machine/IR/BENCHMARK_SUITE/benchmarks/BENCHMARK. For example, for blackscholes of PARSEC, the IR file that is analyzed to generate the invariants is

results/current_machine/IR/PARSEC3/benchmarks/blackscholes/baseline_with_metadata.bc

Dependences

The number of memory dependences in the PDG computed by using only the LLVM alias analyses, and those computed by adding other alias analyses included in NOELLE, can be found in results/current_machine/dependences/BENCHMARK_SUITE/absolute_values.txt and results/current_machine/dependences/BENCHMARK_SUITE/relative_values.txt. The first file includes the absolute numbers of dependences, and the second file includes the fraction of dependences declared by LLVM and NOELLE. The second file is what creates Figure 3 of the paper.

The file absolute_values.txt has the following structure. One row per benchmark and four columns for each row. The first column is the name of the benchmark. The second column is about NOELLE. The third column is about LLVM. The fourth column is the total number of memory dependences computed assuming all memory instructions depend on each other.

The file relative_values.txt has the following structure. One row per benchmark and three columns for each row. The first column is the name of the benchmark. The second column is about NOELLE. The third column is about LLVM.

Loops parallelized

This artifact reports the number of loops parallelized with each technique. This information can be found in the directory results/current_machine/parallelization. Here, each configuration has one text file reporting the number of loops parallelized with each technique.

Each file in the directory results/current_machine/parallelization has the same structure, which is the following. The first column is the name of the benchmark. The second column is the number of loops parallelized using DOALL. The third column is the number of loops parallelized using HELIX. The forth column is the number of loops parallelized using DSWP.

Induction variables

The induction variables of all loops of all benchmarks of a benchmark suite can be found in results/current_machine/loops/BENCHMARK_SUITE/induction_variables.txt. For example, the induction variables of PARSEC benchmarks can be found in the file results/current_machine/loops/PARSEC3/induction_variables.txt.

Each benchmark has one line in the related induction_variables.txt. This file is organized in three columns. The first column is the name of the benchmark. The second column is the number of induction variables accumulated over all loops of a given benchmark that are detected by LLVM. The third column is the number of induction variables accumulated over all loops of a given benchmark that are detected by NOELLE. The IR that is analyzed to generate this information is the result of all NOELLE transformations that run before a parallelization scheme (this code is the IR file baseline_with_metadata.bc of a given benchmark).

thhuang / noelleGym