The Path to v1.0.0

Question

The Path to v1.0.0

PanQiWei opened this issue 9 months ago · comments

Hi everyone, long time no see! Start from this week, I will use about 4 weeks to gradually push AutoGPTQ to v1.0.0, in the mean time, there will be 2~3 minor version released as optimization or feature preview so that you can experience those updates as soon as they are finished and I can hear more community voices and get more feedbacks.

My vision is at the time of v1.0.0 is released, AutoGPTQ can serve as an automatic, extendable and flexible quantization backend for all language models that are written by Pytorch.

I open this issue to list all the things will be done (optimizations, new features, bug fixes, etc) and record the development progress.(so contents below will be updated frequently)

Feel free to comment in the thread to give your opinions and suggestions!

Optimizations

refactor the code framework for the future extensions while maintain the important interfaces.
- separate quantization logic as a stand alone module and serve as mixin.
- design automatic structure recognize strategy to better support different models (hope can even support multi-modal and diffusion models).
speed up model packing after quantization.
support kernel fusion to more models to futher speed up inference.

New Features

model sharping: split model checkpoint into multiple files and load from multiple files. #364
tensor parallelism for all kinds of QuantLinear that are supported by AutoGPTQ.
CLI: run common commands such as quantization and benchmark directly.

Bug Fixes

Author Morgan · Answer 1 · Thu Feb 15 2024 16:54:42 GMT+0800 (China Standard Time)

Hi @PanQiWei, Any updates regarding version 1.0.0?