Here you find a collection of CUDA related material (books, papers, blog-post, youtube videos, tweets, implementations etc.). We also collect information to higher level tools for performance optimization and kernel development like Triton and torch.compile()
... whatever makes the GPUs go brrrr.
You know a great resource we should add? Please see How to contribute.
- An Easy Introduction to CUDA C and C++
- CUDA Toolkit Documentation
- Basic terminology: Thread block, Warp, Streaming Multiprocessor: Wiki: Thread Block, A tour of CUDA
- Accelerating Generative AI with PyTorch: Segment Anything, Fast
- Accelerating Generative AI with PyTorch II: GPT, Fast
- PyTorch Compiler Troubleshooting
To share interesting CUDA related links please create a pull request for this file. See editing files in the github documentation.
Or contact us on the CUDA MODE discord server: invitation link TBD