MicroZHY / CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDATutorial

A CUDA tutorial to make people learn CUDA program from 0

test enviroment

Turing T4 GPU

remark

  • related performance data is attached at the top of code file.
  • the performance data is diverse and diverse on different GPU platforms and NVCC compiler, so some counter-intuitive result is normal, we should only explore and debug the result.
  • welcome all comments and pull requests.

update notes

v2.0

  • add cuda stream
  • add quantize

v2.1

  • add fp32/fp16 gemv

About

A CUDA tutorial to make people learn CUDA program from 0


Languages

Language:Cuda 100.0%