Out of Memory, Even with Batch Size 1 and 11 GB GPU

Question

Out of Memory, Even with Batch Size 1 and 11 GB GPU

b7leung opened this issue 3 years ago · comments

I'm trying to train an inverse paraphraser on my own custom dataset (I already followed these data preprocessing steps). My command is below; distributed training has been turned off. Even with a batch size of only 1, I still run out of memory on a GTX 1080TI (~11 GB). Is this expected, and are 2+ gpus are simply required? Or did I get something wrong? Is there anything else I can do make training work on 1 GPU?

python $BASE_DIR/run_lm_finetuning.py \
    --output_dir=$BASE_DIR/saved_models/298954459172700181_muffins \
    --model_type=gpt2 \
    --model_name_or_path=gpt2-large \
    --data_dir=$DATA_DIR \
    --do_train \
    --save_steps 500 \
    --logging_steps 20 \
    --save_total_limit -1 \
    --evaluate_during_training \
    --num_train_epochs 3 \
    --gradient_accumulation_steps 1 \
    --per_gpu_train_batch_size 1 \
    --job_id 298954459172700181_muffins \
    --learning_rate 5e-5 \
    --prefix_input_type paraphrase_250 \
    --global_dense_feature_list none \
    --specific_style_train -1 \
    --optimizer adam

Tim Jones · Answer 1 · Sun May 09 2021 02:42:37 GMT+0800 (China Standard Time)

Hi, it is possible to add some parameters to use fp16 instead of fp32 which saved me enough memory to train the inverse paraphraser model on a 16GB P100 on Colab Pro. Try adding --fp16 and --fp16_opt_level "O3" to the above.

You will need to install Apex Amp, which I found was best retrieved using !git clone https://github.com/NVIDIA/apex , check the readme and docs at : https://nvidia.github.io/apex/amp.html, everything is implemented already in Kalpesh's code.

Good luck, it would be cool to compare notes as I am also currently training my inverse paraphraser models.

JonOnEarth · Answer 2 · Sun May 09 2021 20:32:18 GMT+0800 (China Standard Time)

When I run the paraphrase_many.py get the cuda out of memory error, I am not sure what parameter should to adjust?

Tim Jones · Answer 3 · Tue May 11 2021 03:56:27 GMT+0800 (China Standard Time)

@JonOnEarth did you try reducing the batch size using the --batch_size parameter? I haven't tried this but seems like a good starting point as the default is 64.

Kalpesh Krishna · Answer 4 · Thu May 13 2021 03:07:11 GMT+0800 (China Standard Time)

If batch size 1 doesn't fit, you should try a smaller model like gpt2-medium (it's not too much worse). Gradient checkpointing is also an option, but will need more work. We trained all our models on a 24 GB GPU