ModelTC

ModelTC

Geek Repo

Model Infra

Github PK Tool:Github PK Tool

ModelTC's repositories

lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Language:PythonLicense:Apache-2.0Stargazers:1989Issues:20Issues:167

MQBench

Model Quantization Benchmark

Language:ShellLicense:Apache-2.0Stargazers:737Issues:14Issues:196

United-Perception

United Perception

Language:PythonLicense:Apache-2.0Stargazers:425Issues:20Issues:65

llmc

This is the official PyTorch implementation of "LLM-QBench: A Benchmark Towards the Best Practice for Post-training Quantization of Large Language Models", and also an efficient LLM compression tool with various advanced compression methods, supporting multiple inference backends.

Language:PythonLicense:Apache-2.0Stargazers:115Issues:8Issues:3

Dipoorlet

Offline Quantization Tools for Deploy.

Language:PythonLicense:Apache-2.0Stargazers:108Issues:16Issues:9

awesome-lm-system

Summary of system papers/frameworks/codes/tools on training or serving large model

License:Apache-2.0Stargazers:56Issues:9Issues:0

TFMQ-DM

[CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:42Issues:10Issues:1

NART

NART = NART is not A RunTime, a deep learning inference framework.

Language:PythonLicense:Apache-2.0Stargazers:37Issues:10Issues:1

Outlier_Suppression_Plus

Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

Language:PythonLicense:MITStargazers:34Issues:8Issues:6
Language:PythonLicense:Apache-2.0Stargazers:33Issues:2Issues:8

EasyLLM

Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing usability, it also ensures training efficiency.

Language:PythonLicense:Apache-2.0Stargazers:30Issues:8Issues:1

QLLM

[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"

Language:PythonLicense:Apache-2.0Stargazers:26Issues:8Issues:0

pyvlova

Yet another Polyhedra Compiler for DeepLearning

Language:PythonLicense:Apache-2.0Stargazers:19Issues:5Issues:0

AAAI2023_EAMPD

AAAI2023 Efficient and Accurate Models towards Practical Deep Learning Baseline

Language:PythonLicense:Apache-2.0Stargazers:12Issues:6Issues:0
Language:PythonLicense:Apache-2.0Stargazers:11Issues:7Issues:1

Imagenet-S

Robustness for real-world system noise

msbench

A tool for model sparse based on torch.fx

Language:PythonLicense:Apache-2.0Stargazers:4Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Language:PythonLicense:Apache-2.0Stargazers:2Issues:1Issues:0

general-sam

A general suffix automaton implementation in Rust with Python bindings

Language:RustLicense:Apache-2.0Stargazers:2Issues:6Issues:1

mtc-token-healing

Token healing implementation in Rust

Language:RustLicense:Apache-2.0Stargazers:2Issues:6Issues:0
Language:PythonStargazers:1Issues:0Issues:0

general-sam-py

Python bindings for general-sam and some utilities

Language:PythonLicense:Apache-2.0Stargazers:1Issues:8Issues:0
Language:RustLicense:Apache-2.0Stargazers:1Issues:0Issues:0

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

greedy-tokenizer

Greedily tokenize strings with the longest tokens iteratively.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:6Issues:0
Language:PythonStargazers:0Issues:0Issues:0