📌 AutoAWQ Roadmap

Question

📌 AutoAWQ Roadmap

casper-hansen opened this issue 10 months ago · comments

Casper commented 10 months ago

Optimization

Fused layers of LLaMa models
Implement GEMV kernel #40
Implement ExLlama kernels #313
More fused layers for implemented models #40
INT8 quantization #45
Optimize split_k_iters #39

More models

Ease of access

Software integration and quality

Unit & integration testing #31
Integrate into Huggingface optimum/transformers
Quantization config #8
Model weight sharding and shard index #36

Luke · Answer 1 · Thu Sep 21 2023 03:51:57 GMT+0800 (China Standard Time)

Hey Casper, first of all, amazing work!

I'm actually really curious - what's the reasoning behind supporting legacy models such as GPT-2 or GPT-J/OPT that are already in?

In my perception, the latest developments mostly on MPT/Llama 2 are by orders of magnitude better than the legacy models.

Casper · Answer 2 · Thu Sep 21 2023 04:05:51 GMT+0800 (China Standard Time)

Hey Casper, first of all, amazing work!

I'm actually really curious - what's the reasoning behind supporting legacy models such as GPT-2 or GPT-J/OPT that are already in?

In my perception, the latest developments mostly on MPT/Llama 2 are by orders of magnitude better than the legacy models.

Supporting older models is on the roadmap because people still use those models and ask for them. However, I do try to focus my efforts on optimizing the newer models.

heiqilin1985 · Answer 3 · Tue Nov 07 2023 12:01:50 GMT+0800 (China Standard Time)

yi-34b 能支持吗？看数据这个模型很牛叉啊。

Casper · Answer 4 · Tue Nov 07 2023 15:08:28 GMT+0800 (China Standard Time)

yi-34b 能支持吗？看数据这个模型很牛叉啊。

Yi is now supported on the main branch

Sinan · Answer 5 · Sun Dec 03 2023 16:49:28 GMT+0800 (China Standard Time)

Can you please implement Phi 1.5 support? Thank you for all the amazing work!

Yuliang Li · Answer 6 · Sun Dec 03 2023 20:08:42 GMT+0800 (China Standard Time)

Hi Casper, thank you for your wonderful work! I wonder if there is some tutorial for adding support for new model? I have noticed that Baichuan is on the roadmap. I would like try to add support for this model, could you please give me some pointer on how to support new model?

Casper · Answer 7 · Sun Dec 03 2023 20:22:56 GMT+0800 (China Standard Time)

@xTayEx I do not have a written guide, but here are the steps:

Create a model class BaichuanAWQForCausalLM
Add the model to the model map https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/auto.py#L6
Import the model here https://github.com/casper-hansen/AutoAWQ/blob/main/awq/models/__init__.py

For creating the model class, look into the llama class or other classes to see how they are defined.

Casper · Answer 8 · Sun Dec 03 2023 20:23:44 GMT+0800 (China Standard Time)

Can you please implement Phi 1.5 support? Thank you for all the amazing work!

Phi 1.5 support has been attempted, but they have a very unusual model definition. Until it's been standardized, I am not sure I will support it.

Sinan · Answer 9 · Mon Dec 04 2023 01:41:36 GMT+0800 (China Standard Time)

Phi 1.5 support has been attempted, but they have a very unusual model definition. Until it's been standardized, I am not sure I will support it.

Oh :( Do you mean until a new phi model comes out?
Phi 1.5 is such an amazing model for so many applications

What would roughly be the steps to implement it on our own?

christianc · Answer 10 · Tue Jan 16 2024 23:01:44 GMT+0800 (China Standard Time)

Hi @casper-hansen First of all thank you for the Amazing work. From my understanding there is an AWQ TheBloke Mixtral 8x7b Base Instruct version. I tried to run inference on it and ran into issues. Would this model be supported? Also is there a way to contribute with a donation?

Casper · Answer 11 · Fri Mar 01 2024 19:16:15 GMT+0800 (China Standard Time)

We achieved most items on the roadmap, so closing this for now to focus on other things.