t-vi / resource-stream

CUDA related news and material links

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA MODE Resource Stream

Here you find a collection of CUDA related material (books, papers, blog-post, youtube videos, tweets, implementations etc.). We also collect information to higher level tools for performance optimization and kernel development like Triton and torch.compile() ... whatever makes the GPUs go brrrr.

You know a great resource we should add? Please see How to contribute.

1st Contact with CUDA

2nd Contact

Papers, Case Studies

Books

CUDA Grandmasters

Tri Dao

Tim Dettmers

Practice

PyTorch Performance Optimization

PyTorch Internals & Debugging

Code / Libs

Essentials

Profiling

Python GPU Computing

News

Technical Blog Posts

Hardware Architecture

pscan project

Triton Kernels / Examples

  • unsloth that implements custom kernels in Triton for faster QLoRA training
  • Custom implementation of relative position attention (link)
  • Tri Dao's Triton implementation of Flash Attention: flash_attn_triton.py

How to contribute

To share interesting CUDA related links please create a pull request for this file. See editing files in the github documentation.

Or contact us on the CUDA MODE discord server: https://discord.gg/XsdDHGtk9N

About

CUDA related news and material links

License:MIT License