Henry Hyeonmok Ko's starred repositories
LLMs-from-scratch
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
tuning_playbook
A playbook for systematically maximizing the performance of deep learning models.
GPU-Puzzles
Solve puzzles. Learn CUDA.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Liger-Kernel
Efficient Triton Kernels for LLM Training
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
ThunderKittens
Tile primitives for speedy kernels
awesome-jax
JAX - A curated list of resources https://github.com/google/jax
lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Triton-Puzzles
Puzzles for learning Triton
awesome-mixture-of-experts
A collection of AWESOME things about mixture-of-experts
melange-nvim
🗡️ Warm color scheme for Neovim and beyond
Awesome-GPU
Awesome resources for GPUs
hardware-effects-gpu
Demonstration of various hardware effects on CUDA GPUs.
gpu-benches
collection of benchmarks to measure basic GPU capabilities
Awesome-Triton-Kernels
Collection of kernels written in Triton language