Chen Shen's repositories
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
FasterTransformer
Transformer related optimization, including BERT, GPT
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Lightrails
Yet another distributed training/inferencing framework.
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Megatron-LM
Ongoing research training transformer models at scale
mini-redis
Incomplete Redis client and server implementation using Tokio - for learning purposes only
nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
og-equity-compensation
Stock options, RSUs, taxes — read the latest edition: www.holloway.com/ec
r4cppp
Rust for C++ programmers
ScaleLLM
A high-performance inference system for large language models, designed for production environments.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
The-Art-of-Linear-Algebra
Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs