steelbin / lectures

Material for cuda-mode lectures

Supplementary Material for Lectures

discord.gg/cudamode The PMPP Book: Programming Massively Parallel Processors: A Hands-on Approach (Amazon link) YouTube Channel

Lecture 1: Profiling and Integrating CUDA kernels in PyTorch

Video
Date: 2024-01-13, Speaker: Mark Saroufim
Notebook and slides in lecture_001 folder

Lecture 2: Recap Ch. 1-3 from the PMPP book

Video
Date: 2024-01-20, Speaker: Andreas Koepf
Slides: The powerpoint file lecture_002/cuda_mode_lecture2.pptx can be found in the root directory of this repository. Alternatively here as Google docs presentation.

Lecture 3: Getting Started With CUDA

Video
Date: 2024-01-27, Speaker: Jeremy Howard
Notebook: See the lecture_003 folder, or run the Colab version

Lecture 4: Intro to Compute and Memory Architecture

Video
Date: 2024-02-03, Speaker: Thomas Viehmann
Notebook and slides in the lecture_004 folder.

Lecture 5: Going Further with CUDA for Python Programmers

Video
Date: 2024-02-10, Speaker: Jeremy Howard
Notebook in the lecture_005 folder.

Lecture 6: Optimizing PyTorch Optimizers

Video
Date: 2024-02-17, Speaker: Jane Xu
Slides

Lecture 7: Advanced Quantization

Video
Date: 2024-02-25, Speaker: Charles Hernandez
Slides

Lecture 8: CUDA Performance Checklist

Video
Date: 2024-03-09, Speaker: Mark Saroufim
Code in the lecture_008 folder
Slides

Lecture 9: Reductions

Video
Date: 2024-03-09, Speaker: Mark Saroufim
Code in the lecture_009 folder
Slides

Lecture 10: Build a Prod Ready CUDA Library

Video
Date: 2024-03-16, Speaker: Oscar Amoros Huguet
slides

Lecture 11: Sparsity

Video
Date: 2024-03-23, Speaker: Jesse Cai
Slides

Lecture 12: Flash Attention

Video
Date: 2024-03-30, Speaker: Thomas Viehmann

Lecture 13: Ring Attention

Video
Date: 2024-04-06, Speaker: Andreas Koepf
Slides

Lecture 14: Practitioner's Guide to Triton

Video
Date: 2024-04-13, Speaker: Umer Adil
Notebook

Lecture 15: CUTLASS

Date: 2024-04-20, Speaker: Eric Auld

Lecture 16: On Hands profiling

Date: 2024-04-27, Speaker: Taylor Robbie

Bonus Lecture: CUDA C++ llm.cpp

Date: 2024-04-27, Speaker: Jake Hemstad & Georgii Evtushenko
Slides

Lecture 17: GPU Collective Communication (NCCL)

Date: 2024-05-04, Speaker: Dan Johnson
Code in the lecture_017 folder

Lecture 18: Fused Kernels

Date: 2024-05-11, Speaker: Kapil Sharma
Code in the lecture_018 folder

Lecture 19: Data Processing on GPUs

Date: 2024-05-18, Speaker: Devavret Makkar

Lecture 20: Scan Algorithm

Date: 2024-05-25, Speaker: Izzat El Haj
Slides

Lecture 21: Scan Algorithm Part 2

Date: 2024-05-31, Speaker: Izzat El Haj
Slides

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Date: 2024-06-01, Speaker: Cade Daniel
Slides

Lecture 23: Tensor Cores

Date: 2024-06-07, Speaker: Vijay Thakkar & Pradeep Ramani
Slides

Lecture 24: Scan at the Speed of Light

Date: 2024-06-08, Speaker: Jake Hemstad & Georgii Evtushenko
Slides

About

Material for cuda-mode lectures

Apache License 2.0

Languages

Language:Jupyter Notebook 87.4%Language:Python 10.1%Language:Cuda 2.3%Language:C++ 0.1%Language:CMake 0.1%Language:Makefile 0.0%