LiuXinyu's starred repositories
nvidia-docker
Build and run Docker containers leveraging NVIDIA GPUs
magic-animate
[CVPR 2024] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
mistral-src
Reference implementation of Mistral AI 7B v0.1 model.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
PowerInfer
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
pytest-benchmark
py.test fixture for benchmarking code
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
onnxruntime-inference-examples
Examples for using ONNX Runtime for machine learning inferencing.
tensorrtllm_backend
The Triton TensorRT-LLM Backend
database-system-readings
:yum: A curated reading list about database systems
flash-attention
Fast and memory-efficient exact attention
Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式 代码与笔记