OOM when training LLaMA-2-70B

Question

OOM when training LLaMA-2-70B

0three opened this issue a year ago · comments

MaksV79 commented a year ago

After 4 step, the OOM occurs.

Reduce Batch_size to 8, the OOM still occurs.

MaksV79 · Answer 1 · Fri Aug 18 2023 09:49:10 GMT+0800 (China Standard Time)

8 A100 80G

Ariel N. Lee · Answer 2 · Fri Aug 18 2023 09:52:22 GMT+0800 (China Standard Time)

Which library are you using-torchrun, accelerate, naive python, etc?

MaksV79 · Answer 3 · Fri Aug 18 2023 14:17:42 GMT+0800 (China Standard Time)

Thanks for your reply, I use torchrun which is the default settings in fine-tuning.sh

Ariel N. Lee · Answer 4 · Fri Aug 18 2023 23:04:36 GMT+0800 (China Standard Time)

Got it! For the 70b model you'll need to use accelerate or some other library that takes advantage of model parallelization (torchrun is data parallelization). See the finetune.py section of our readme for additional details: https://github.com/arielnlee/Platypus#fine-tuning-finetunepy

Thanks for the heads up, I will fix the fine-tuning.sh settings. Please let me know if you have additional questions.

MaksV79 · Answer 5 · Sat Aug 19 2023 08:52:00 GMT+0800 (China Standard Time)

haha. I can only finetune it with lora_rank 8 and cut-off length 512 for a simple finetuning.

I'll try accelerate later.[It might cause performance reduction from my perspective.]

Thanks for your suggestions!

Ariel N. Lee · Answer 6 · Sat Aug 19 2023 12:37:46 GMT+0800 (China Standard Time)

Sorry to hear that! I just used the python finetune.py command in the fine-tuning section (alternative to torchrun) and it worked with lora_r 16 / micro batch size 1 / batch_size 32 on 4 A100 80gb GPUS. Also used cutoff length 4096. About 20 hours to run. Maybe try setting world_size=1 so you have model parallelism?