AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Path to v1.0.0

PanQiWei opened this issue · comments

Hi everyone, long time no see! Start from this week, I will use about 4 weeks to gradually push AutoGPTQ to v1.0.0, in the mean time, there will be 2~3 minor version released as optimization or feature preview so that you can experience those updates as soon as they are finished and I can hear more community voices and get more feedbacks.

My vision is at the time of v1.0.0 is released, AutoGPTQ can serve as an automatic, extendable and flexible quantization backend for all language models that are written by Pytorch.

I open this issue to list all the things will be done (optimizations, new features, bug fixes, etc) and record the development progress.(so contents below will be updated frequently)

Feel free to comment in the thread to give your opinions and suggestions!

Optimizations

  • refactor the code framework for the future extensions while maintain the important interfaces.
    • separate quantization logic as a stand alone module and serve as mixin.
    • design automatic structure recognize strategy to better support different models (hope can even support multi-modal and diffusion models).
  • speed up model packing after quantization.
  • support kernel fusion to more models to futher speed up inference.

New Features

  • model sharping: split model checkpoint into multiple files and load from multiple files. #364
  • tensor parallelism for all kinds of QuantLinear that are supported by AutoGPTQ.
  • CLI: run common commands such as quantization and benchmark directly.

Bug Fixes

Hi @PanQiWei, Any updates regarding version 1.0.0?