LiuXinyu's starred repositories
TensorRT-Model-Optimizer
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
tensorrt_backend
The Triton backend for TensorRT.
pytorch_backend
The Triton backend for the PyTorch TorchScript models.
DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
flash-attention
Fast and memory-efficient exact attention
Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式 代码与笔记
tensorrtllm_backend
The Triton TensorRT-LLM Backend
pytest-benchmark
pytest fixture for benchmarking code
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
mistral-inference
Official inference library for Mistral models
database-system-readings
:yum: A curated reading list about database systems
magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model