Wanglongzhi2001

followers

following

stars

University of Electronic Science and Technology of China

Chengdu

Longzhi Wang's repositories

act

Run your GitHub Actions locally 🚀

Language:GoMIT000

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonMIT000

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

MIT000

Awesome-LLM-System-Papers

000

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

Apache-2.0000

EAGLE

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonApache-2.0000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

flashinfer

FlashInfer: Kernel Library for LLM Serving

Apache-2.0000

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Apache-2.0000

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0000

gligen-gui

An intuitive GUI for GLIGEN that uses ComfyUI in the backend

NOASSERTION000

grok-1

Grok open release

Language:PythonApache-2.0000

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

MIT000

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonMIT000

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:Python000

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.0000

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonMIT000

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Apache-2.0000

LWM

Language:PythonApache-2.0000

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

MIT000

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT000

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

Language:C++Apache-2.0000

PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Language:PythonApache-2.0000

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Apache-2.0000

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

Apache-2.0000

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

MIT000

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0000

triton

Development repository for the Triton language and compiler

Language:C++MIT000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Apache-2.0000