res876ttb / sparkler_sc

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sparkler

Overview

The Sparkler miniapp computes a specialized dense matrix-matrix product C = A^T A for small integer elements of the matrix A. This operation mimics the matrix product operation used to compute the Custom Corellation Coefficient (CCC) in the CoMet computational genomics code.

Building

The build requires MPI and make. The default build requires CUDA 9.2 or higher for NVIDIA GPUs. An alternative build path for CPU-only execution requires an installed BLAS library, preferably multithreaded if the runs use more than one core per MPI rank.

To build for a cluster, modify the Makefile to reflect your MPI and CUDA installs and then type "make" (GPU case) or "env USE_GPU=NO make" (CPU-only case).

Running

Running the GPU executable requires one or more NVIDIA GPUs. Volta V100 or later (compute capability 7.0 or higher) GPUs are preferred; older GPUs will run much slower due to lack of tensor core hardware.

A run is composed of a series of iterations, each representing a global dense matrix-matrix product. A single iteration is composed of steps, each corresponding to a single GEMM executed on each GPU.

Command-line options:

    --num_vector - number of vectors (half the number of columns of matrix A)

    --num_field - number of fields (the number of rows of A)

    --num_iterations - number of (global) matrix products done

Example:

mpirun -n 2 ./exec.cpu --num_vector 1000 --num_field 2000 --num_iterations 2

Reported values are:

TF - teraflops, total number of GEMM floating point operations

GEMM sec - total time spent in GPU GEMM operations

GEMM TF/sec - GEMM teraflop rate, ratio of TF to GEMM sec

total sec - total runtime

hash - a hash of the results computed, for evaluating correctness

Competition Test Cases:

There are 9 test cases, and total 30 scores.

Score num_vector num_field Iteration
2 6000 1250 512
2 2000 7500 768
3 3200 5500 1024
3 2000 57600 200
2 1000 192000 300
1 1000 32000 600
1 800 50000 100
6 1000 12800 2560
10 8000 21600 100

Sample output:

summit-batch4$ mpirun -np 4 ./exec.cpu --num_vector 4000 --num_field 90000 --num_iterations 400
num_vector 4000 num_field 90000 num_iterations 400 num_proc 1
Iteration 1 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 400 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
TF 4608.000 GEMM sec XXXXXX GEMM TF/sec XXXXXX total sec XXXXXX hash 435999930709XXXXXX

About


Languages

Language:Cuda 100.0%