casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Home Page:https://casper-hansen.github.io/AutoAWQ/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

๐Ÿ“Œ AutoAWQ Roadmap

casper-hansen opened this issue ยท comments

Optimization

  • Fused layers of LLaMa models
  • Implement GEMV kernel #40
  • Implement ExLlama kernels #313
  • More fused layers for implemented models #40
  • INT8 quantization #45
  • Optimize split_k_iters #39

More models

Ease of access

  • Distribute PyPi package
  • Re-add LLaVa model compatibility #22
  • Custom datasets to quantize models with #27
  • Metal GPUs #44
  • ROCm GPUs #315
  • CPU implementation
  • Push to hub functionality #42

Software integration and quality

  • Unit & integration testing #31
  • Integrate into Huggingface optimum/transformers
  • Quantization config #8
  • Model weight sharding and shard index #36
commented

Hey Casper, first of all, amazing work!

I'm actually really curious - what's the reasoning behind supporting legacy models such as GPT-2 or GPT-J/OPT that are already in?

In my perception, the latest developments mostly on MPT/Llama 2 are by orders of magnitude better than the legacy models.

Hey Casper, first of all, amazing work!

I'm actually really curious - what's the reasoning behind supporting legacy models such as GPT-2 or GPT-J/OPT that are already in?

In my perception, the latest developments mostly on MPT/Llama 2 are by orders of magnitude better than the legacy models.

Supporting older models is on the roadmap because people still use those models and ask for them. However, I do try to focus my efforts on optimizing the newer models.

yi-34b ่ƒฝๆ”ฏๆŒๅ—๏ผŸ็œ‹ๆ•ฐๆฎ่ฟ™ไธชๆจกๅž‹ๅพˆ็‰›ๅ‰ๅ•Šใ€‚

yi-34b ่ƒฝๆ”ฏๆŒๅ—๏ผŸ็œ‹ๆ•ฐๆฎ่ฟ™ไธชๆจกๅž‹ๅพˆ็‰›ๅ‰ๅ•Šใ€‚

Yi is now supported on the main branch

Can you please implement Phi 1.5 support? Thank you for all the amazing work!

Hi Casper, thank you for your wonderful work! I wonder if there is some tutorial for adding support for new model? I have noticed that Baichuan is on the roadmap. I would like try to add support for this model, could you please give me some pointer on how to support new model?

@xTayEx I do not have a written guide, but here are the steps:

  1. Create a model class BaichuanAWQForCausalLM
  2. Add the model to the model map https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/auto.py#L6
  3. Import the model here https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/__init__.py

For creating the model class, look into the llama class or other classes to see how they are defined.

Can you please implement Phi 1.5 support? Thank you for all the amazing work!

Phi 1.5 support has been attempted, but they have a very unusual model definition. Until it's been standardized, I am not sure I will support it.

Phi 1.5 support has been attempted, but they have a very unusual model definition. Until it's been standardized, I am not sure I will support it.

Oh :( Do you mean until a new phi model comes out?
Phi 1.5 is such an amazing model for so many applications

What would roughly be the steps to implement it on our own?

Hi @casper-hansen First of all thank you for the Amazing work. From my understanding there is an AWQ TheBloke Mixtral 8x7b Base Instruct version. I tried to run inference on it and ran into issues. Would this model be supported? Also is there a way to contribute with a donation?

We achieved most items on the roadmap, so closing this for now to focus on other things.