Longzhi Wang (Wanglongzhi2001)

Wanglongzhi2001

Geek Repo

Company:University of Electronic Science and Technology of China

Location:Chengdu

Github PK Tool:Github PK Tool

Longzhi Wang's repositories

act

Run your GitHub Actions locally 🚀

Language:GoLicense:MITStargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

License:Apache-2.0Stargazers:0Issues:0Issues:0

EAGLE

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

License:Apache-2.0Stargazers:0Issues:0Issues:0

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

License:Apache-2.0Stargazers:0Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

gligen-gui

An intuitive GUI for GLIGEN that uses ComfyUI in the backend

License:NOASSERTIONStargazers:0Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

License:MITStargazers:0Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Language:PythonStargazers:0Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

License:MITStargazers:0Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sglang

SGLang is yet another fast serving framework for large language models and vision language models.

License:Apache-2.0Stargazers:0Issues:0Issues:0

SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

License:Apache-2.0Stargazers:0Issues:0Issues:0

stable-fast

Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

License:MITStargazers:0Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

License:Apache-2.0Stargazers:0Issues:0Issues:0