Questions about training GPU memory
MingfangDeng opened this issue · comments
I kept getting an insufficient video memory error during my run. I checked the paper you posted, and it doesn't say anything about the size of the GPU memory, or the GPU type. I would like to know roughly how much GPU memory is needed to do the fine-tuning training in stage2. In addition to that, what is the configuration of the machine that you mentioned for training 70B parameters in 3 days. Thank you very much!
I still face this problem. I use 80G A100 to finetune the model. But still have the out of memory. Can you tell me how much GPU is needed. I did not find it in your paper. Thank you very much and looking forward to your reply!
Sorry, I couldn't respond to you in a timely manner. We haven't attempted to train a model larger than 13 billion parameters; that's the maximum size we've trained. I suggest you try using Lora to train a 70 billion parameters model, as we've found that the performance difference between Lora and fine-tuning is not significant.
At the same time, we have observed that when the size of the language model increases to a certain extent, the performance bottleneck is mainly constrained by the visual encoder. Therefore, you may also consider trying larger visual encoders, such as clip-vit-large-patch14-336 or dino v2. The improvement from a more powerful visual encoder might be more significant than scaling up the language model from 13 billion parameters to 70 billion.
ok, I want to know 13B params what GPU memory you used. Because when we fine-tuning the chat-Univi on stage 2, we have the out of memory problem. I want to ask this question, I am looking forward to your reply!
We use 8*A800 (each with 80G of memory) and use deepspeed zero2 for finetuning.
thank you very much !