IIR Filter implementation
Implementation of IIR Filter with thrust. The
parallelization enables fully coalesced memory access by transposing the
data during the addition of the causal and anti-causal pass with
cuBLAS. Furthermore, the
implementation can also be used without CUDA
enabled devices using the
OpenMP backend of
thrust. A comparison
between different parallelization approaches can be found in the
presentation slides (and the different approaches' implementations can be
found in the commit history).
Usage
The repository contains two examples, one test and two programs for timing
purposes. Calling make
builds all of them with the CUDA
backend and
make omp
with the OpenMP
backend (make sure to call make clean
if you
want to __re__build).
Executables
-
example_thrust_deriche
: Performs Gaussian image blurring using the thrust implementation. -
example_seq_deriche
: Same as above with sequential reference implementation. -
test_thrust_deriche
: Compares the results of Gaussian image blurring between the thrust implementation and the sequential implementation and gives a warning if the error on any pixel is larger than 1e-4. -
time_thrust_deriche
: Performs IIR filter on square data with specified number of rows and prints four columns:- Number of rows
- Time for setup (memory allocation and host to device transfer)
- Time for horizontal pass
- Time for vertical pass
- Time for finalizing (device to host transfer)
Notice that the final version does not use a horizontal pass and columns 3 and 4 should simply be added together and treated as the compute time.
-
time_seq_deriche
: Same as above with sequential implementation.
Other Files
coefficients.h
: Datastructure and computation of Deriche coefficients used to approximate Gaussian blurring with IIR filter.seq_deriche.h
: Sequential implementation.thrust_deriche.h
: Thrust implementation.timer.h
: Timer that can be used with and without CUDA, makes sure to synchronize with CUDA and returns time in seconds.utils.h
: Utilities used for debugging and testing.collect_sequential.sh
: Script to collect timings of sequential implementation for different sizes of input data. Output can be easily read by pandas.collect_thrust.sh
: Same as above for thrust implementation.alternatives/
: More alternatives for parallelization to be explored.images/
: Some example images to be used with the examples.png++/
: PNG++ - a C++ interface tolibpng
to read and writepng
images.presentation/
: Presentation slides of this work.
Contact
- Judith Massa j.massa@mail.de
- Patrick Esser p.esser@stud.uni-heidelberg.de