Longzhi Wang (Wanglongzhi2001)

Wanglongzhi2001

Geek Repo

Company:University of Electronic Science and Technology of China

Location:Chengdu

Github PK Tool:Github PK Tool

Longzhi Wang's repositories

act

Run your GitHub Actions locally 🚀

Language:GoLicense:MITStargazers:0Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

License:MITStargazers:0Issues:0Issues:0

Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

License:MITStargazers:0Issues:0Issues:0

Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

License:GPL-3.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

Camp

飞桨护航计划集训营

Language:MermaidStargazers:0Issues:0Issues:0

DeepCache

[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free

License:Apache-2.0Stargazers:0Issues:0Issues:0

EAGLE

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

License:Apache-2.0Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

License:Apache-2.0Stargazers:0Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

License:Apache-2.0Stargazers:0Issues:0Issues:0

gligen-gui

An intuitive GUI for GLIGEN that uses ComfyUI in the backend

License:NOASSERTIONStargazers:0Issues:0Issues:0

grok-1

Grok open release

License:Apache-2.0Stargazers:0Issues:0Issues:0

KIVI

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

License:MITStargazers:0Issues:0Issues:0

KVQuant

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Stargazers:0Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

MegEngine

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

ml_dtypes

A stand-alone implementation of several NumPy dtype extensions used in machine learning.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

mlx

MLX: An array framework for Apple silicon

Language:C++License:MITStargazers:0Issues:0Issues:0

OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

License:MITStargazers:0Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

License:MITStargazers:0Issues:0Issues:0

Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++License:Apache-2.0Stargazers:0Issues:0Issues:0

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

triton

Development repository for the Triton language and compiler

Language:C++License:MITStargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0