Questions about training GPU memory

Question

Questions about training GPU memory

MingfangDeng opened this issue 9 months ago · comments

I kept getting an insufficient video memory error during my run. I checked the paper you posted, and it doesn't say anything about the size of the GPU memory, or the GPU type. I would like to know roughly how much GPU memory is needed to do the fine-tuning training in stage2. In addition to that, what is the configuration of the machine that you mentioned for training 70B parameters in 3 days. Thank you very much!

MingfangDeng · Answer 1 · Tue Dec 26 2023 17:11:05 GMT+0800 (China Standard Time)

I still face this problem. I use 80G A100 to finetune the model. But still have the out of memory. Can you tell me how much GPU is needed. I did not find it in your paper. Thank you very much and looking forward to your reply!

Peng Jin · Answer 2 · Tue Jan 02 2024 16:00:52 GMT+0800 (China Standard Time)

Sorry, I couldn't respond to you in a timely manner. We haven't attempted to train a model larger than 13 billion parameters; that's the maximum size we've trained. I suggest you try using Lora to train a 70 billion parameters model, as we've found that the performance difference between Lora and fine-tuning is not significant.

Peng Jin · Answer 3 · Tue Jan 02 2024 16:04:48 GMT+0800 (China Standard Time)

At the same time, we have observed that when the size of the language model increases to a certain extent, the performance bottleneck is mainly constrained by the visual encoder. Therefore, you may also consider trying larger visual encoders, such as clip-vit-large-patch14-336 or dino v2. The improvement from a more powerful visual encoder might be more significant than scaling up the language model from 13 billion parameters to 70 billion.

MingfangDeng · Answer 4 · Thu Jan 04 2024 11:11:03 GMT+0800 (China Standard Time)

ok, I want to know 13B params what GPU memory you used. Because when we fine-tuning the chat-Univi on stage 2, we have the out of memory problem. I want to ask this question, I am looking forward to your reply!

Peng Jin · Answer 5 · Thu Jan 04 2024 11:15:32 GMT+0800 (China Standard Time)

We use 8*A800 (each with 80G of memory) and use deepspeed zero2 for finetuning.

MingfangDeng · Answer 6 · Sat Jan 06 2024 11:35:55 GMT+0800 (China Standard Time)

thank you very much !