Yulong Wang's starred repositories
the-algorithm
Source code for Twitter's Recommendation Algorithm
Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
search_with_lepton
Building a quick conversation-based search demo with Lepton AI.
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
FasterTransformer
Transformer related optimization, including BERT, GPT
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Fengshenbang-LM
Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系,成为中文AIGC和认知智能的基础设施。
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
HolisticTraceAnalysis
A library to analyze PyTorch traces.