Zeyu Li's repositories
Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
6000D-Project
This is the repo for 6000D(Graph Processing and Analytics) final proj of HKUST-GZ
CutlassHelloWorld
This is a repo for Cutlass learning.
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cutlass
CUDA Templates for Linear Algebra Subroutines
DALLE-pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
DiT
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
dp-nblist
[WIP] Resable and Modular Neighborlist Lib
galeselee.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
JekyllHelloWorld
This repo is a tutorial for Jekyll learning.
lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
llama
Inference code for LLaMA models
llm.c
LLM training in simple, raw C/CUDA
MGG
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
MLPerf_inference
Reference implementations of MLPerf™ inference benchmarks
molcpp
[WIP] C++ Kernel for MolPy
nccl
Optimized primitives for collective multi-GPU communication
nccl-tests
NCCL Tests
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
text-generation-inference
Large Language Model Text Generation Inference
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs