Artifacts for "An Abstract Interpretation for SPMD Divergence on Reducible Control Flow Graphs"

Our paper is accompanied by two artifacts:

A Coq mechanization of most of the paper's definitions and theorems.
An implementation of our analysis within the LLVM compiler and a quantitative evaluation of its precision and compile-time impact.

Mechanized Proofs

The mechanized proofs have their own GitHub repository. This repository includes a README explaining how to compile the Coq proofs. The CoqDoc of the development is generated by the included Makefile, but can also be investigated here.

Quantitative Evaluation

List of Claims

The per-kernel results shown Figures 6 and Figure 7 in Appendix A are re-producible.
We compare against an implementation of Strong-Post Dominance as described by Wasserab et Al. (setting spd).
We compare against the Divergence Analysis algorithm of LLVM (setting llvm).

We use Docker to provide a re-producible build and evaluation environemnt. The following assumes that you are running Docker on a x86_64 host machine. We provide a docker image with additional scripts to build all binaries and run the evaluation. There are three main steps once you have the Docker image running:

Build LLVM, the OpenCL driver, the AnyDSL compiler stack and the OpenCL benchmarks. The OpenCL driver (pocl) and Thorin (AnyDSL) are modified to dump the compute kernel LLVM IR whenever an OpenCL application is run (or an AnyDSL application gets compiled).
Use the compilers and OpenCL benchmarks from the first step to extract the compute kernels for later analysis.
Run the divergence analysis tool (dacomp). The last script invocation will end with a rendering of the evaluation tables (Figure 5 and Figure 6).

Claim 1: Re-producibility of Figure 6 and 7

The following steps build all compiler components and benchmarks from source. The final script ./evaluate.sh will perform the evaluation on your system and render result tables akin to Figure 6 and Figure 7 in the paper. Note that the runtime measurements reported in the paper were taken on a Intel(R) Core(TM) i7-8565U CPU with 24GB of RAM with an ArchLinux installation. We disabled hyper-threading and turbo boost and made sure that no other processes were interfering with the measurement process. You will not be able to reproduce the runtime measurements when running inside a VM - you still can reproduce the instruction, branch and loop statistics.

Build the Docker image

Download the workspace with the Dockerfile from Google Drive. Extract the archive and go to the extracted folder popl_release.

You need to supplement two datasets to continue. The HeteroMark data folder and the LuxMark scene folder. Goto benchmarks/Hetero-Mark and follow the instructions in download_data.sh. Then, download the LuxMark Scene Archive and extract it into benchmarks/LuxMark/.

You are now ready to build the Docker image. Run:

bash ./docker_build.sh

This process will build the Docker image and install some of the necessary Ubuntu packages. It should take about 15min, depending on your system and internet connection. You only need to do this once (unless the workspace folder changes on the host).

Launch the Docker image

The docker image needs to be attached your X-Server (required for LuxMark). This command will also open a bash inside the Docker container. Keep it open for the next steps.

bash ./docker_run.sh

[Optional] Copy your ISO file of SPEC ACCEL 1.3

Due to licensing issues, we cannot provide SPEC ACCEL 1.3 with the Docker image. However, there is a script to plug-in your own copy of the benchmark suite. If you have the SPEC ACCEL 1.3 ISO file, run the following commands outside the Docker container to load and install it inside the running Docker container.

Mount the ISO file (accel-1.3.iso) on your host system as a loop device.

mkdir -p ~/iso
sudo mount accel-1.3.iso ~/iso -o loop

Query the ID of the running Docker container (from a shell that is running outside the container):

docker ps

Then, copy the contents of the ISO into the container (at ~/iso).

docker cp ~/iso <container_id>:/home/theuser/iso/

Now, in the running container shell (still at ~/workspace), run the following command to integrate SPEC ACCEL into the evaluation system:

bash install_spec.sh

Build Compilers and OpenCL host applications

Inside the docker shell run (takes 1-2h):

bash ./build.sh

Extract the Kernels

This command runs all OpenCL benchmarks and compiles all RV SPMD kernels (takes ~15min). The kernels are extracted to ~/workspace/extracted_kernels/.

bash ./extract_kernels.sh

Extract the LuxMark Kernels

We provide the extracted LLVM IR for the LuxMark kernels in our image. To copy them to the dump folder, call:

bash ./copy_luxmark_kernels.sh

(We provide instructions below on how to extract the LuxMark OpenCL IR kernels yourself)

Transfer the dumped kernels to the analysis folder

This will copy the extracted SPMD kernels (from ~/workspace/extracted_kernels) into the input folder (~/workspace/dacomp/kernels) of dacomp, the Divergence Analysis tool.

bash ./transfer_kernels.sh

Evaluate the Divergence Analysis Configuration

Finally, run the evaluation script to perform the actual evaluation. This will evaluate all divergence analysis configurations on the transfered kernel files in ~/workspace/dacomp/kernels (takes about 5min with NUM_SAMPLES=3, the out of the box setting):

bash ./evaluate.sh

When the script has finished, it will print a result table similar to Figure 6 and 7 in the paper. The table is also stored in ~/workspace/result_tables.txt. The variable NUM_SAMPLES is used in this script to configure the number of sample runs. Out of the box, NUM_SAMPLES is set to three, which is too low for proper runtime measurements, but allows you to check quickly that your setup works correctly. To get more robust runtime measurements, turn off hyper threading and turbo boost and change the valute of NUM_SAMPLES to at least 30 (expected script runtime ~30min).

[Optional] LuxMark kernel extraction

We provide pre-extracted OpenCL kernels (as LLVM IR) for LuxMark since its build prerequesites and setup blow up the image substantially. In case, you want to extract the kernels yourself, first make sure that you can run Docker OpenGL applications with your setup (the steps depend on your host system and gpu). LuxMark is a GUI application and needs your input to launch the OpenCL driver.

First, build luxmark with the command

bash ./build_luxmark.sh

When you are prompted for the root passwort, enter theuser. LuxMark requires a lot more packages to build and run then the rest of the setup. To run the luxmark binary, launch the following script, which will already set it up with our OpenCL driver for kernel extraction:

bash ./extract_luxmark_kernels.sh

Then, in the GUI, configure LuxMark for OpenCL over CPU and start it. After a while all kernel modules have been exported. You can end LuxMark as soon as it starts rendering.

Claims 2 & 3: The Implementation

The spd,llvm and new configuration (see Section 7.1 Baselines in the paper) correspond to different Control-Divergence Analyses plugged into the Divergence Analysis of RV. You find the source code of RV's Divergence Analysis at llvm-project/llvm/tools/rv/src/analysis/VectorizationAnalysis.cpp The new Control-Divergence Analysis is implemented at llvm-project/llvm/lib/Analysis/SyncDependenceAnalysis.cpp. We use environment variables to configure which of the three configurations RV will use: when the environment defines NEW_DA=1 and does not define OLD_DA nor SPD, then, RV and thus dacomp will run with the new algorithm presented in the paper. The tool dacomp in dacomp/dacomp.cpp is the command line frontend that computes the evaluation results.

Claim 2: SPD implementation

We implemented the Strong-Post Dominance technique by Wasserrab et Al. in llvm-project/llvm/tools/rv/src/analysis/SPDA.cpp. The implementation modifies the CFG and uses the Dominance Frontier Graph implementation at llvm-project/llvm/tools/rv/src/analysis/DFG.cpp to compute the dependence relation. Set the environment variables OLD_DA=1 and SPD=1 to run RV and dacomp with this configuration (NEW_DA must not be set in the environment!).

Claim 3: LLVM Divergence Analysis implementation

The old Divergence Analysis of LLVM is available at llvm-project/llvm/lib/Analysis/LegacyDivergenceAnalysis.cpp. We ported that analysis into RV to use it as a pluggable Control-Divergence Analysis for RV's Divergence Analysis. That port is at llvm-project/llvm/tools/rv/src/analysis/LegacySDA.cpp (eg a diff of those two files shows the few changes that were made - outlining and indentation). Set the environment variables OLD_DA=1 (neither SPD nor NEW_DA must be set to any value!) to run RV and dacomp with this configuration.

cdl-saarland / uniana-artifact