GPU Consumption keeps on increasing

Question

GPU Consumption keeps on increasing

nikhilbyte opened this issue 2 years ago · comments

Hi,
I started training the model with the following parameters:
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --langs hi_IN --batch_size_indicates_lines --pretrained_model "facebook/mbart-large-50" --model_path "facebook/mbart-large-50" --tokenizer_name_or_path "facebook/mbart-large-50" --mono_src "sans_seq2seq/cleaned_Sanskrit_text_for_LM.txt" --shard_files --batch_size 2

It starts training, however, after a few hours, it crashes due to OOM.
Monitoring the GPU, I found that the GPU consumption keeps on increasing.

GPU Memory is 48GB.

Can you please tell me what could cause this?
Thanks

Raj Dabre · Answer 1 · Mon Apr 25 2022 14:26:49 GMT+0800 (China Standard Time)

Most likely a very long sequence. Try setting the --hard_truncate_length flag to a smaller value. Currently it's 1024 and this may be too much. Try 256. Try to find out the example on which you get an OOM or paste the error logs. I've never actually tested the pretraining functionality on mbart 50 so it will be helpful to know what's causing the issue.

For reference, I've done fine tuning of mbart 50 on a 32 GB GPU and whenever I get ooms it's usually because of a stray example.