Beast code in Giters

Mengzhao Chen's starred repositories

LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Language:PythonApache-2.029130 190 4586

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION25567 211 228

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT11128 161 252

FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Language:PythonApache-2.09107 110 81

OLMo

Modeling, training, eval, and inference code for OLMo

Language:PythonApache-2.04297 43 184

kimi-free-api

🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。

Language:TypeScriptGPL-3.03558 30 111

LLMDataHub

A quick guide (especially) for trending instruction finetuning datasets

MIT2351 46 3

Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Language:PythonMIT1985 31 83

executorch

On-device AI across mobile, embedded and edge for PyTorch

Language:C++NOASSERTION1632 55 344

BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Language:PythonMIT1499 38 36

GaLore

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Language:PythonApache-2.01313 17 49

quanto

A pytorch Quantization Toolkit

Language:PythonApache-2.0613 8 65

qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Language:PythonApache-2.0375 9 25

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT265 12 42

Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Language:Cuda239 11 15

QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Language:PythonApache-2.0236 11 34

llmc

This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Language:PythonApache-2.0196 9 8

FastV

[ECCV 2024] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:Python184 3 21

LLaMA3-Quantization

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Language:Python145 5 10

lmquant

Language:PythonApache-2.092 1 13

decoupleQ

A quantization algorithm for LLM

Language:CudaApache-2.089 2 11

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language:Python80 5 7

fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

Language:CBSD-3-Clause79 3 5

bllama

1.58-bit LLaMa model

Language:PythonMIT77 110

LLaVA-PruMerge

LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

Language:PythonApache-2.074 2 13

BitDistiller

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Language:PythonMIT56 3 6

svit

Official implementation of "SViT: Revisiting Token Pruning for Object Detection and Instance Segmentation"

Language:PythonApache-2.023 9 10

VTW

Language:Python1800

gptvq

Language:ShellBSD-3-Clause-Clear1701

qat-pretrain

Language:Cuda4 10