zalo / matmul_cuda

A simple learning example for CUDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

matmul_cuda

A simple learning example for CUDA.

Take in a .bin file specifying a series of int32 rows, int32 cols, int64_t* matrixContents matrices and multiply them together as quickly as possible. Save the resultant matrix as output.bin in the same format.

On Windows, this application compiles with:

nvcc -o matmul_cuda_v2 matmul_cuda.cu -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64" -O 3 -arch=all --extra-device-vectorization

On Linux, this application compiles with:

nvcc -o matmul_cuda_v2 matmul_cuda.cu -O 3 -arch=all --extra-device-vectorization

About

A simple learning example for CUDA

License:MIT License


Languages

Language:Cuda 100.0%