Weili17's starred repositories
giantpandacv.com
www.giantpandacv.com
smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
llama-cpp-python
Python bindings for llama.cpp
step_into_llm
MindSpore online courses: Step into LLM
CUDA_Programming
《CUDA编程基础与实践》一书的代码
CUDA-Programming
Sample codes for my CUDA programming book
KuiperLLama
动手实现大模型推理框架
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
baby-llama2-chinese_cybertron
使用单个24G显卡,从0开始训练LLM
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
batch-prompting
[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.
SpeculativeDecodingPapers
📰 Must-read papers and blogs on Speculative Decoding ⚡️
long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.