prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GPU Consumption keeps on increasing

nikhilbyte opened this issue · comments

Hi,
I started training the model with the following parameters:
python pretrain_nmt.py -n 1 -nr 0 -g 1 --use_official_pretrained --langs hi_IN --batch_size_indicates_lines --pretrained_model "facebook/mbart-large-50" --model_path "facebook/mbart-large-50" --tokenizer_name_or_path "facebook/mbart-large-50" --mono_src "sans_seq2seq/cleaned_Sanskrit_text_for_LM.txt" --shard_files --batch_size 2

It starts training, however, after a few hours, it crashes due to OOM.
Monitoring the GPU, I found that the GPU consumption keeps on increasing.

GPU Memory is 48GB.

Can you please tell me what could cause this?
Thanks

Most likely a very long sequence. Try setting the --hard_truncate_length flag to a smaller value. Currently it's 1024 and this may be too much. Try 256. Try to find out the example on which you get an OOM or paste the error logs. I've never actually tested the pretraining functionality on mbart 50 so it will be helpful to know what's causing the issue.

For reference, I've done fine tuning of mbart 50 on a 32 GB GPU and whenever I get ooms it's usually because of a stray example.