A collection of scripts I wrote when learning CUDA C++ programming.
Performs 1D convolution. TODO: Add stride, dilation, arbitrary input size support
TODO
Basic version in here
Takes in a 1D array and applies a 'stencil' operation (similar to convolution), but for integer arrays.
Vector addition split across threads as well as blocks.
Vector addition parallelized across threads.
Vector addition parallelized across blocks.
Launches a CUDA kernel that practically does nothing.