NedaTavakoli / psac

Parallel Suffix Array, LCP Array, and Suffix Tree Construction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parallel Suffix Array and Tree Construction

Build Status Build Status Test Coverage Apache 2.0 License

This library implements a distributed-memory parallel algorithm for the construction of suffix arrays, LCP arrays, and suffix trees. The algorithm is implemented in C++11 and MPI.

The algorithm implemented by this codebase is described in the following peer-reviewed publication. Please cite this paper, when using our code for academic purposes:

Flick, Patrick, and Srinivas Aluru. "Parallel distributed memory construction of suffix and longest common prefix arrays." Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2015. dx.doi.org/10.1145/2807591.2807609

Code organization

  • include/ contains the implementation of our algorithms in form of C++ template header files (a header-only library).
  • src/ contains the sources for binaries, which make use of the implementations in include/.
  • test contains unit tests for the components of the library.
  • ext/ contains external, third-party dependencies/libraries. See the README for details on the third-party libraries used.

Dependencies

  • cmake version >= 2.6
  • C++11 compatible compiler (tested with gcc and clang)
  • an MPI implementation supporting MPI-2 or MPI-3.
  • external (third-party) dependencies are included in the ext/ directory

Compiling

To compile the executables and tests via cmake run the following:

mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release ../
make

Running

After compiling, there will a multiple binaries available in the build/bin folder. Running --help on them will give more detailed usage information. Here's a short overview over the different binaries and their function:

  • psac is our main executable. This will construct the suffix and LCP array of a given input file. Run with mpirun for parallel execution.
  • benchmark_sac benchmarks multiple of our methods. Run with mpirun.
  • dss is a wrapper around libdivsufsort that follows the same command line usage as our other binaries. This is a sequential program. No mpirun needed.
  • psac-vs-dss runs both our suffix array construction and libdivsufsort, verifies the results against each other and outputs run-times of both.
  • test_* various test executables, testing a variety of our internal methods.

Licensing

Our code is licensed under the Apache License 2.0 (see LICENSE). The licensing does not apply to the ext folder, which contains external dependencies which are under their own licensing terms.

About

Parallel Suffix Array, LCP Array, and Suffix Tree Construction

License:Apache License 2.0


Languages

Language:C++ 52.4%Language:Shell 26.1%Language:Makefile 8.6%Language:C 8.4%Language:CMake 3.5%Language:Python 0.6%Language:M4 0.5%