Aaryan0404 / CUDA

accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA

About

accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s


Languages

Language:Cuda 37.1%Language:C 32.5%Language:C++ 24.1%Language:Makefile 4.6%Language:Cool 0.8%Language:Shell 0.6%Language:Python 0.3%Language:CMake 0.1%