buaabai

Jinyu Bai's starred repositories

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION38579 298 2869

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION23524 196 197

yolov10

YOLOv10: Real-Time End-to-End Object Detection

Language:PythonAGPL-3.08481 42 314

Digital

A digital logic designer and circuit simulator.

Language:JavaGPL-3.04166 91 888

matmulfreellm

Implementation for MatMul-free LM.

Language:PythonApache-2.02724 43 23

DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Language:PythonNOASSERTION472 9 13

ao

Custom data types and layouts for training and inference

Language:PythonBSD-3-Clause428 25 89

HolisticTraceAnalysis

A library to analyze PyTorch traces.

Language:PythonMIT254 17 52

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT228 11 18

fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Language:CudaApache-2.0161 4 8

BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Language:PythonMIT158 6 13

BitMat

An efficent implementation of the method proposed in "The Era of 1-bit LLMs"

Language:PythonApache-2.0148 6 10

LLaMA3-Quantization

A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..

Language:Python138 5 10

Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Language:Cuda95 3 3

AutoSmoothQuant

An easy-to-use package for implementing SmoothQuant for LLMs

Language:PythonMIT67 3 16

Awesome-LLM-Quantization

Awesome list for LLM quantization

Language:Python53 40

SLAB

[ICML 2024] Official PyTorch implementation of "SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization"

Language:Python47 3 4

ShiftAddLLM

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Language:PythonApache-2.043 2 1

FPGA-QOI

FPGA-based QOI image compressor and decompressor in Verilog language. 基于FPGA的QOI图像压缩器和解压器。

Language:VerilogGPL-3.016 10

APT

[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

Language:PythonMIT1400

Floating-Point-Adder

32 bit pipelined binary floating point adder using IEEE-754 Single Precision Format in Verilog

Language:VerilogMIT13 20

evol-q

Quantization in the Jagged Loss Landscape of Vision Transformers

Language:PythonApache-2.011 50

QST

Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models

Language:PythonApache-2.010 20

BGEMM-CUDA

This is a repository of Binary General Matrix Multiply (BGEMM) by customized CUDA kernel. Thank FP6-LLM for the wheels!

Language:CudaApache-2.0800

retraining-free-quantization

RFQuant: Retraining-free Model Quantization via One-Shot Weight-Coupling Learning, CVPR (2024)

Language:PythonMIT4 20

qattn

Efficient GPU kernels for mixed-precision Vision Transformers in Triton

Language:PythonMIT400

Uint-Packing

Language:C++3 10

dac_sdc_2023_champion

Language:C++200

Ansor-AF-DS

This repository contains the figures, tables data and source code in the paper ICS'24: "Accelerated Auto-Tuning of GPU Kernels for Tensor Computations".

Language:Python200

SelectiveFocus

Language:Python200