wm901115nwpu's repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
DeepLearningSystem
Deep Learning System core principles introduction.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
DeepSpeedExamples
Example models using DeepSpeed
depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
LLaMA-Factory
Unify Efficient Fine-tuning of 100+ LLMs
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Megatron-LM
Ongoing research training transformer models at scale
mmcv
OpenMMLab Computer Vision Foundation
mmdeploy
OpenMMLab Model Deployment Framework
mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
NeMo
NeMo: a framework for generative AI
neural-compressor
Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.
OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
workshops
This is a repository for all workshop related materials.
xtuner
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)