Yeojoon's starred repositories
LLMs-from-scratch
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
simple-local-rag
Build a RAG (Retrieval Augmented Generation) pipeline from scratch and have it all run locally.
GPU-Puzzles
Solve puzzles. Learn CUDA.
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
lm-evaluation-harness
A framework for few-shot evaluation of language models.
flash-attention
Fast and memory-efficient exact attention
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
low-bit-optimizers
Low-bit optimizers for PyTorch
FedML
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
Awesome-Federated-Learning
FedML - The Research and Production Integrated Federated Learning Library: https://fedml.ai
FedAc-NeurIPS20
Code for "Federated Accelerated Stochastic Gradient Descent" (NeurIPS 2020)