a-mos / Cuda_Mnist_Denoising

MNIST denoising auto-encoder on CUDA from scratch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA MNIST denoising

MNIST denoising auto-encoder on CUDA from scratch

Low (28x28) and high (256x256) resolution examples:


Requirements

Tested with CUDA 11.6, NVCC 11.6, Ubuntu 20.04

Compile

cd project_folder
nvcc -O2 kernel.cu main.cpp -o denoiser

Run

./denoiser {input_img_path} [optional -benchmark N]
if N > 0 runs multiple times and measure time 

Model training

Trainig in Model training.ipynb follows https://keras.io/examples/vision/autoencoder/ with added weights export to binary file.

Optimizations

Baseline

Img shape: 28x28x1
Num runs: 1000
Total GPU time: 6034.27 ms
AVG one forward pass GPU time: 6.03427 ms

Merging activations with layers

Img shape: 28x28x1
Num runs: 1000
Total GPU time: 5998.17 ms
AVG one forward pass GPU time: 5.99817 ms

Using shared memory for caching input tensor

Img shape: 28x28x1
Num runs: 1000
Total GPU time: 3673.7 ms
AVG one forward pass GPU time: 3.6737 ms

About

MNIST denoising auto-encoder on CUDA from scratch

License:MIT License


Languages

Language:C 51.7%Language:Jupyter Notebook 45.1%Language:Cuda 2.9%Language:C++ 0.3%