wm901115nwpu's repositories
Awesome-LLM
Awesome-LLM: a curated list of Large Language Model
Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
DeepLearningSystem
Deep Learning System core principles introduction.
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
DeepSpeedExamples
Example models using DeepSpeed
depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
LLaMA-Factory
Unify Efficient Fine-tuning of 100+ LLMs
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Megatron-LM
Ongoing research training transformer models at scale
minisora
The Mini Sora project aims to explore the implementation path and future development direction of Sora.
NeMo
NeMo: a framework for generative AI
neural-compressor
Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.
nndeploy
nndeploy is a cross-platform, high-performing, and straightforward AI model deployment framework. We strive to deliver a consistent and user-friendly experience across various inference framework in complex deployment environments and focus on performance. nndeploy一款跨平台、高性能、简单易用的模型端到端部署框架。我们致力于屏蔽不同推理框架的差异,提供一致且用户友好的编程体验,同时专注于部署全流程的性能。
OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-image-models
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
TensorRT
TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
triton
Development repository for the Triton language and compiler
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
xtuner
An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)
zero_nlp
中文nlp解决方案(大模型、数据、模型、训练、推理)