Henry Hyeonmok Ko (henryhmko)

henryhmko

Geek Repo

Location:UC Berkeley

Home Page:https://henryhmko.github.io/

Github PK Tool:Github PK Tool

Henry Hyeonmok Ko's starred repositories

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:35416Issues:345Issues:2819

LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:32335Issues:349Issues:102

tuning_playbook

A playbook for systematically maximizing the performance of deep learning models.

swarm

Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

Language:PythonLicense:MITStargazers:15898Issues:262Issues:11

triton

Development repository for the Triton language and compiler

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookLicense:MITStargazers:9869Issues:193Issues:32

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:8622Issues:93Issues:1948

sglang

SGLang is a fast serving framework for large language models and vision language models.

Language:PythonLicense:Apache-2.0Stargazers:5992Issues:57Issues:629

Liger-Kernel

Efficient Triton Kernels for LLM Training

Language:PythonLicense:BSD-2-ClauseStargazers:3407Issues:39Issues:98

equinox

Elegant easy-to-use neural networks + scientific computing in JAX. https://docs.kidger.site/equinox/

Language:PythonLicense:Apache-2.0Stargazers:2105Issues:24Issues:454

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1958Issues:35Issues:349

ThunderKittens

Tile primitives for speedy kernels

Language:CudaLicense:MITStargazers:1645Issues:29Issues:27

ao

PyTorch native quantization and sparsity for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:1557Issues:40Issues:291

awesome-jax

JAX - A curated list of resources https://github.com/google/jax

ArcticDB

ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

Language:C++License:NOASSERTIONStargazers:1507Issues:26Issues:846
Language:PythonLicense:NOASSERTIONStargazers:1255Issues:20Issues:89

lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Language:PythonLicense:Apache-2.0Stargazers:1193Issues:34Issues:543

Triton-Puzzles

Puzzles for learning Triton

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1111Issues:10Issues:13

awesome-mixture-of-experts

A collection of AWESOME things about mixture-of-experts

melange-nvim

🗡️ Warm color scheme for Neovim and beyond

Language:LuaLicense:MITStargazers:723Issues:3Issues:38

MS-AMP

Microsoft Automatic Mixed Precision Library

Language:PythonLicense:MITStargazers:523Issues:11Issues:65
Language:PythonLicense:Apache-2.0Stargazers:517Issues:6Issues:12

Awesome-GPU

Awesome resources for GPUs

License:BSD-3-ClauseStargazers:490Issues:24Issues:0

MS-SNSD

The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.

Language:HTMLLicense:MITStargazers:484Issues:20Issues:15

hardware-effects-gpu

Demonstration of various hardware effects on CUDA GPUs.

Language:C++License:MITStargazers:356Issues:10Issues:1

gpu-benches

collection of benchmarks to measure basic GPU capabilities

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:264Issues:8Issues:11

lolcats

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Language:PythonLicense:Apache-2.0Stargazers:171Issues:20Issues:1

Awesome-Triton-Kernels

Collection of kernels written in Triton language

License:MITStargazers:63Issues:3Issues:0

gpt-jax

Jax/Flax rewrite of Karpathy's nanoGPT

Language:PythonLicense:MITStargazers:49Issues:3Issues:4