bamartos / gpu_graph_algorithms

Term Project, Parallel Algorithms, Summer 2020, The University of Texas, Austin

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Accelerating Bellman-Ford Single Source Shortest Path Algorithm on GPU using CUDA

Term Project, Parallel Algorithms, Dr.Vijay Garg, Summer 2020, The University of Texas, Austin

https://towardsdatascience.com/bellman-ford-single-source-shortest-path-algorithm-on-gpu-using-cuda-a358da20144b

Compiling the project

  1. Connect to maverick2.tacc.utexas.edu
  2. Upload the files
  3. mkdir build/
  4. cd build
  5. cmake ../.
  6. make - Builds executable bellman
  7. $ ./bellman seq ../input/simple.gr - Runs sequential version
  8. $./bellman cuda ../input/USA-road-d.NY.gr 196 1024 0 - Runs cuda implementation version 1
  9. $./bellman cuda-stride ../input/USA-road-d.NY.gr 196 1024 0 - Runs cuda implementation version 2
  10. $./bellman cuda-v3 ../input/USA-road-d.NY.gr 196 1024 0 - Runs cuda implementation version 3

Implementation Details

  • Folder sequential contains base sequential implementation of BellmanFord.
  • Folder gpu contains cuda implementation of BellmanFord
  • Folder utilities contains common utility functions used by both the implementations
  • Folder parsercontains the parser utility to covert DIMACS graph files (http://users.diag.uniroma1.it/challenge9/format.shtml#ss) into CSR format. Run parser using $ ./parser ./input/USA-road-d.NY.gr command
  • Folder input contains graph files in CSR format
  • main.cpp - Runs both sequential and cuda versions
Sample Graph:
CSR Representation of the Graph:
  • V : array of vertices of size |V|
  • I : array of starting index of the adjacency list of edges in E array. Size |V+1|. The last element stores |E|
  • E : array of edges |E|
  • W : array of weights |W|
Shortest Path : 1 -> 4 -> 3 -> 2 -> 5
  • from 1 to 1 = 0, predecessor = 0
  • from 1 to 4 = 7, predecessor = 1
  • from 1 to 3 = 4, predecessor = 4
  • from 1 to 2 = 2, predecessor = 3
  • from 1 to 5 = -2, predecessor =

Input Data

  • input folder contains random and USA road networks graphs from DIMACS in CSR format
  • Each array in the CSR format is stored in separate CSV files which are read by the CUDA program
  • parser/parser.go converts the DIMACS files into CSR format and stores in individual csv files
  • Example: USA-road-d.NY.gr file from http://users.diag.uniroma1.it/challenge9/download.shtml has been transformed into the below ones
    • USA-road-d.NY.gr_V.csv - Contains V array (as depicted in figure above)
    • USA-road-d.NY.gr_I.csv - Contains I array
    • USA-road-d.NY.gr_E.csv - Contains E array
    • USA-road-d.NY.gr_W.csv - Contains W array
    • USA-road-d.NY.gr_FROM.csv & USA-road-d.NY.gr_TO.csv- Contains all edges of the graph where source is in FROM and destination vertex is in TO (Will be useful for version 2 stated below)

Bellman Ford GPU Implementation

Implement and study the performance in three different flavors of the algorithm

  • Version 1 - One thread to each vertex (to relax all outgoing edges of each vertex) - Number of Blocks is determined based on input nodes. More threads & each thread doing less work
  • Version 2 - Introduce stride inside kernel. Fixed number of blocks. Less threads and each thread doing more work
  • Version 3 - Do relaxation for each V[i] only if Flag[i] is set to true. i.e. if V[i] has shorter distance than the previous iteration.

Results

Performance Analysis
Graph_results

References

About

Term Project, Parallel Algorithms, Summer 2020, The University of Texas, Austin


Languages

Language:Cuda 64.9%Language:C++ 15.5%Language:Go 10.9%Language:Shell 7.9%Language:CMake 0.8%