ilur98 / DGQ

Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM [Paper]

features and milestone:

  • DGQ algorithm for A8W4 models.
  • Memory-efficient Linear Layers for FakeQuant For Pytorch.
  • Efficient CUTLASS kernel implementation for fast inference.
  • Edge Device Support. [We are working with it.]

Install

conda create -n dgq python=3.10 -y
conda activate dgq
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Kernel install

CUDA 12.1 need to be installed first. We recommend using the bitsandbytes script(https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh).

source environment.sh
bash build_cutlass.sh
cd dgq/kernels/
python setup.py install

Usage

We provide a sample script to run DGQ('./llama7b.sh')

  1. Perform DGQ quantization and save the true quant model:
	python -m dgq.entry [your-model-path] [dataset] --wt_fun search --groupsize 128 --wbits 4 --smoothquant --w4w8 --kvquant --save_safetensors [path-to-save]
  1. Load and evaluate the real quantized model:
	python -m dgq.entry [your-model-path] [dataset] --wt_fun search --groupsize 128 --wbits 4 --smoothquant --w4w8 --kvquant --load [path-to-save] --eval

Reference

If you find our work useful or relevant to your research, please kindly cite our paper:

@article{zhang2023dual,
  title={Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM},
  author={Zhang, Luoming and Fei, Wen and Wu, Weijia and He, Yefei and Lou, Zhenyu and Zhou, Hong},
  journal={arXiv preprint arXiv:2310.04836},
  year={2023}
}

Acknowledgements

Our codes refers to followed projects: GPTQ GPTQ-for-LLaMA AWQ SmoothQuant torch-int fasttransformer

About

Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

License:MIT License


Languages

Language:Python 52.0%Language:C++ 39.6%Language:Cuda 7.3%Language:Shell 0.6%Language:C 0.5%