bryancatanzaro / inplace

CUDA and OpenMP implementations of C2R/R2C inplace transposition

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inplace

CUDA and OpenMP implementations of the C2R and R2C inplace transposition algorithms. These algorithms are described in our PPoPP paper.

We have included a specialization for very tall, skinny matrices that yields good performance for in-place conversions between Arrays of Structures and Structures of Arrays.

The code includes OpenMP and CUDA implementations. The OpenMP implementation is declared in <inplace/openmp.h>, while the CUDA implementation is declared in <inplace/transpose.h>, and carries the following signatures:

namespace inplace {

void transpose(bool row_major, float* data, int m, int n);
void transpose(bool row_major, double* data, int m, int n);

}

About

CUDA and OpenMP implementations of C2R/R2C inplace transposition

License:GNU General Public License v2.0


Languages

Language:Cuda 36.7%Language:Python 35.7%Language:C++ 27.3%Language:Makefile 0.3%