xuqiantong / CUDA-Winograd

Fast CUDA Kernels for ResNet Inference.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

This code implements fast cuda kernels for DNN inference, especially for convolution layers / residule blocks in ResNet. Specifically, the kernels combine three parts into one piece:

  • Convolution
  • Batch Nomalization (BN + Scale)
  • Activation (ReLU)

For implementation details, please refer to the technical report included in this repo. Winograd algorithm is used for 3 * 3 convolutional kernels.

Usage

mkdir data
python data_generator.py
make
./Test 0
  • Set parameters in data_generator.py
  • Run 6 test cases with changing numbers from 0 to 5 after ./Test

Results

3 * 3 Kernels

Kernals Operations 128 / 128 256 / 256
Cudnn Gemm + BN + ReLU 214us 384us
Cudnn Winograd + BN + ReLU 95us 155us
Our Kernel Winograd + BN + ReLU 59us 117us

1 * 1 Kernels [BUGGY NUMBERS]

Kernals 512 / 128 128 / 512 1024 / 256 256 / 1024
Operations Gemm + BN + ReLU Gemm + BN Gemm + BN + ReLU Gemm + BN + ReLU
Cudnn 119us 115us 219us 214us
Our Kernel 58us 55us 186us 181us

About

Fast CUDA Kernels for ResNet Inference.


Languages

Language:Cuda 83.7%Language:C 8.8%Language:Python 6.7%Language:Makefile 0.9%