Aaryan0404 / CUDA

accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s

Geek Repo

Github PK Tool

CUDA

About

accelerating inference and training for transformer-based models by building cuda kernels that optimally saturate the memory bandwidth and arithmetic capabilities of hopper h100s

Languages

Language:Cuda 37.1%Language:C 32.5%Language:C++ 24.1%Language:Makefile 4.6%Language:Cool 0.8%Language:Shell 0.6%Language:Python 0.3%Language:CMake 0.1%