wm901115nwpu's repositories

CGraph

【A common used C++ DAG framework】 一个通用的、无三方依赖的、跨平台的、收录于awesome-cpp的、基于流图的并行计算框架。欢迎star & fork

Language:C++License:MITStargazers:1Issues:0Issues:0

Awesome-LLM

Awesome-LLM: a curated list of Large Language Model

License:CC0-1.0Stargazers:0Issues:0Issues:0

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

License:MITStargazers:0Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0

DeepLearningSystem

Deep Learning System core principles introduction.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

DeepSpeedExamples

Example models using DeepSpeed

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:Jupyter NotebookStargazers:0Issues:1Issues:0

LLaMA-Factory

Unify Efficient Fine-tuning of 100+ LLMs

License:Apache-2.0Stargazers:0Issues:0Issues:0

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

License:Apache-2.0Stargazers:0Issues:0Issues:0

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

minisora

The Mini Sora project aims to explore the implementation path and future development direction of Sora.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

NeMo

NeMo: a framework for generative AI

License:Apache-2.0Stargazers:0Issues:0Issues:0

neural-compressor

Provide unified APIs for SOTA model compression techniques, such as low precision (INT8/INT4/FP4/NF4) quantization, sparsity, pruning, and knowledge distillation on mainstream AI frameworks such as TensorFlow, PyTorch, and ONNX Runtime.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

nndeploy

nndeploy is a cross-platform, high-performing, and straightforward AI model deployment framework. We strive to deliver a consistent and user-friendly experience across various inference framework in complex deployment environments and focus on performance. nndeploy一款跨平台、高性能、简单易用的模型端到端部署框架。我们致力于屏蔽不同推理框架的差异,提供一致且用户友好的编程体验,同时专注于部署全流程的性能。

License:Apache-2.0Stargazers:0Issues:0Issues:0

OnnxSlim

A Toolkit to Help Optimize Large Onnx Model

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

pytorch-image-models

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

TensorRT

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0

tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning large models (InternLM, Llama, Baichuan, Qwen, ChatGLM)

License:Apache-2.0Stargazers:0Issues:0Issues:0

zero_nlp

中文nlp解决方案(大模型、数据、模型、训练、推理)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0