timdecode / gpu-prefix-sum

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU Prefix Sum

  • Uses Blelloch's Algorithm (exclusive scan)
  • Not limited by 2048 items (a former restriction on the initial implementation of the algorithm due to the maximum threads that can run in a thread block on current GPUs)
  • Not limited by input sizes that are powers of 2 (a former restriction due to inherent binary tree-approach of the algorithm)
  • Free of shared memory bank conflicts using the index padding method in this paper.

About

CUDA implementation of exclusive prefix sum via Blelloch's algorithm


Languages

Language:Cuda 77.4%Language:C++ 19.4%Language:C 1.9%Language:Makefile 1.3%