Jinyu Bai (buaabai)

buaabai

Geek Repo

Company:BUAA

Location:Beijing, China

Github PK Tool:Github PK Tool

Jinyu Bai's starred repositories

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptLicense:NOASSERTIONStargazers:38579Issues:298Issues:2869

llama3

The official Meta Llama 3 GitHub site

Language:PythonLicense:NOASSERTIONStargazers:23524Issues:196Issues:197

yolov10

YOLOv10: Real-Time End-to-End Object Detection

Language:PythonLicense:AGPL-3.0Stargazers:8481Issues:42Issues:314

Digital

A digital logic designer and circuit simulator.

Language:JavaLicense:GPL-3.0Stargazers:4166Issues:91Issues:888

matmulfreellm

Implementation for MatMul-free LM.

Language:PythonLicense:Apache-2.0Stargazers:2724Issues:43Issues:23

DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Language:PythonLicense:NOASSERTIONStargazers:472Issues:9Issues:13

ao

Custom data types and layouts for training and inference

Language:PythonLicense:BSD-3-ClauseStargazers:428Issues:25Issues:89

HolisticTraceAnalysis

A library to analyze PyTorch traces.

Language:PythonLicense:MITStargazers:254Issues:17Issues:52

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:228Issues:11Issues:18

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaLicense:Apache-2.0Stargazers:161Issues:4Issues:8

BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Language:PythonLicense:MITStargazers:158Issues:6Issues:13

BitMat

An efficent implementation of the method proposed in "The Era of 1-bit LLMs"

Language:PythonLicense:Apache-2.0Stargazers:148Issues:6Issues:10

LLaMA3-Quantization

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

AutoSmoothQuant

An easy-to-use package for implementing SmoothQuant for LLMs

Language:PythonLicense:MITStargazers:67Issues:3Issues:16

Awesome-LLM-Quantization

Awesome list for LLM quantization

Language:PythonStargazers:53Issues:4Issues:0

SLAB

[ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization"

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonLicense:Apache-2.0Stargazers:43Issues:2Issues:1

FPGA-QOI

FPGA-based QOI image compressor and decompressor in Verilog language. 基于FPGA的QOI图像压缩器和解压器。

Language:VerilogLicense:GPL-3.0Stargazers:16Issues:1Issues:0

APT

[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Language:PythonLicense:MITStargazers:14Issues:0Issues:0

Floating-Point-Adder

32 bit pipelined binary floating point adder using IEEE-754 Single Precision Format in Verilog

Language:VerilogLicense:MITStargazers:13Issues:2Issues:0

evol-q

Quantization in the Jagged Loss Landscape of Vision Transformers

Language:PythonLicense:Apache-2.0Stargazers:11Issues:5Issues:0

QST

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Language:PythonLicense:Apache-2.0Stargazers:10Issues:2Issues:0

BGEMM-CUDA

This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!

Language:CudaLicense:Apache-2.0Stargazers:8Issues:0Issues:0

retraining-free-quantization

RFQuant: Retraining-free Model Quantization via One-Shot Weight-Coupling Learning, CVPR (2024)

Language:PythonLicense:MITStargazers:4Issues:2Issues:0

qattn

Efficient GPU kernels for mixed-precision Vision Transformers in Triton

Language:PythonLicense:MITStargazers:4Issues:0Issues:0
Language:C++Stargazers:2Issues:0Issues:0

Ansor-AF-DS

This repository contains the figures, tables data and source code in the paper ICS'24: "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations".

Language:PythonStargazers:2Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0