Beast code in Giters

wm901115nwpu's repositories

apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Language:PythonBSD-3-Clause000

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

GPL-3.0000

DeepLearningSystem

Deep Learning System core principles introduction.

Language:Jupyter NotebookApache-2.0000

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonApache-2.0000

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonApache-2.0000

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonApache-2.0000

depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Language:PythonMIT000

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonApache-2.0000

LLaMA-Factory

Unify Efficient Fine-tuning of 100+ LLMs

Apache-2.0000

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Apache-2.0000

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION000

mmcv

OpenMMLab Computer Vision Foundation

Language:PythonApache-2.0000

mmdeploy

OpenMMLab Model Deployment Framework

Language:PythonApache-2.0000

mmengine

OpenMMLab Foundational Library for Training Deep Learning Models

Language:PythonApache-2.0000

NeMo

NeMo: a framework for generative AI

Apache-2.0000

Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.

Language:PythonApache-2.0000

OnnxSlim

A Toolkit to Help Optimize Large Onnx Model

Language:PythonMIT000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

pytorch-image-models

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonApache-2.0000

TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

Language:C++Apache-2.0000

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0000

triton

Development repository for the Triton language and compiler

Language:C++MIT000

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonApache-2.0000

workshops

This is a repository for all workshop related materials.

Language:Jupyter Notebook000

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)

Apache-2.0000

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

Language:PythonMIT000

wm901115nwpu

wm901115nwpu's repositories

apex

Awesome-LLM-Inference

DeepLearningSystem

DeepSpeed

DeepSpeed-MII

DeepSpeedExamples

depyf

intel-extension-for-transformers

learn_pytorch2.0

LLaMA-Factory

lmdeploy

mamba

Megatron-LM

mmcv

mmdeploy

mmengine

NeMo

neural-compressor

OnnxSlim

pytorch

pytorch-image-models

TensorRT

TransformerEngine

triton

tvm

vllm

wenet

workshops

xtuner

zero_nlp