galeselee

Zeyu Li's repositories

Awesome_LLM_System-PaperList

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

131 40

6000D-Project

This is the repo for 6000D(Graph Processing and Analytics) final proj of HKUST-GZ

Language:Cuda2 10

HOSCF

HOSCF: EFFICIENT DECOUPLING ALGORITHMS FOR FINDING THE1 BEST RANK-ONE APPROXIMATION OF HIGHER-ORDER TENSORS

Language:C++200

taichi

Productive & portable high-performance programming in Python.

Language:C++Apache-2.0200

CutlassHelloWorld

This is a repo for Cutlass learning.

100

PerfLLM

PerfLLM is a benchmark suite for LLM serving.

Language:Python100

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD-3-Clause000

CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

MIT000

cutlass

CUDA Templates for Linear Algebra Subroutines

NOASSERTION000

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

MIT000

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

NOASSERTION000

dp-nblist

[WIP] Resable and Modular Neighborlist Lib

Language:PythonMIT000

galeselee

The description card

010

galeselee.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT000

JekyllHelloWorld

This repo is a tutorial for Jekyll learning.

Language:HTML000

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonApache-2.0000

llama

Inference code for LLaMA models

Language:PythonNOASSERTION000

llm.c

LLM training in simple, raw C/CUDA

MIT000

MLPerf_inference

Reference implementations of MLPerf™ inference benchmarks

Apache-2.0000

nccl

Optimized primitives for collective multi-GPU communication

NOASSERTION000

nccl-tests

NCCL Tests

BSD-3-Clause000

NV-DVFS-Benchmark

000

perf-book

The book "Performance Analysis and Tuning on Modern CPU"

CC0-1.0000

sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Apache-2.0000

sosp-ae-astra

Language:Jupyter Notebook000

sparse-fft

Language:C++GPL-3.0000

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.0000

galeselee

Zeyu Li's repositories

Awesome_LLM_System-PaperList

6000D-Project

HOSCF

taichi

CutlassHelloWorld

PerfLLM

apex

CLIP

cutlass

DALLE-pytorch

DiT

dp-nblist

galeselee

galeselee.github.io

JekyllHelloWorld

lightllm

llama

llm.c

MLPerf_inference

nccl

nccl-tests

NV-DVFS-Benchmark

perf-book

sarathi-serve

sosp-ae-astra

sparse-fft

TensorRT-LLM

text-generation-inference

TritonHelloWorld

vllm