WangRongsheng / Aurora

🐳 Aurora is a [Chinese Version] MoE model. Aurora is a further work based on Mixtral-8x7B, which activates the chat capability of the model's Chinese open domain.

Home Page:https://arxiv.org/abs/2312.14557

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

请教是否可以在两块4090ti上进行微调?

rookiebird opened this issue · comments

作者您好,作为刚入坑llm的新人,想请教两个问题:

  1. github 您提到了训练只需要43GB左右的内存, 但是论文中您用的是80G的H100, 我只有两块24G的4090ti, 可能可以用这个配置微调么?
  2. 想问除了在您清洗的数据上微调,模型还有做其它改动吗? 比如词表扩充之类的,粗略的看了下论文,好像没有提到这方面的改动, 感谢。

感谢您的关注。

  1. 您可以使用CUDA_VISIBLE_DEVICES=0,1进行两卡微调尝试。
  2. 一些相关工作正在进行,对词表进行修改需要进行预训练的相关工作,因此我们仅仅是在前期发布了一个微调的模型,欢迎继续关注我们后期的工作。

@rookiebird @WangRongsheng Are there any scripts available for the finetuning please? Thanks in advance!

@mikeleatila You can finetune your model according to the finetune command on the project home page. I think there is not way finetuning this model on 4090ti because 24GB is not enough to fit this model even if it is 4bit quantized. You can run inference with 4bit quantization and cpu offload in 4090ti.

@rookiebird Ok thanks a lot