by Mert SIDE, Ghazanfar ALI, and Yong CHEN
November 1st, 2022
In this lecture, we write a program to add the elements of two arrays.
We begin by examining the code in C++ running on the CPU.
Then, we write the CUDA version of that code to run on the GPU.
We see that taking full advantage of the GPU requires some fine-tuning.
To achieve this, we profile different versions of the code to make it run faster!