Out of Memory, Even with Batch Size 1 and 11 GB GPU
b7leung opened this issue · comments
I'm trying to train an inverse paraphraser on my own custom dataset (I already followed these data preprocessing steps). My command is below; distributed training has been turned off. Even with a batch size of only 1, I still run out of memory on a GTX 1080TI (~11 GB). Is this expected, and are 2+ gpus are simply required? Or did I get something wrong? Is there anything else I can do make training work on 1 GPU?
python $BASE_DIR/run_lm_finetuning.py \
--output_dir=$BASE_DIR/saved_models/298954459172700181_muffins \
--model_type=gpt2 \
--model_name_or_path=gpt2-large \
--data_dir=$DATA_DIR \
--do_train \
--save_steps 500 \
--logging_steps 20 \
--save_total_limit -1 \
--evaluate_during_training \
--num_train_epochs 3 \
--gradient_accumulation_steps 1 \
--per_gpu_train_batch_size 1 \
--job_id 298954459172700181_muffins \
--learning_rate 5e-5 \
--prefix_input_type paraphrase_250 \
--global_dense_feature_list none \
--specific_style_train -1 \
--optimizer adam
Hi, it is possible to add some parameters to use fp16 instead of fp32 which saved me enough memory to train the inverse paraphraser model on a 16GB P100 on Colab Pro. Try adding --fp16
and --fp16_opt_level "O3"
to the above.
You will need to install Apex Amp, which I found was best retrieved using !git clone https://github.com/NVIDIA/apex , check the readme and docs at : https://nvidia.github.io/apex/amp.html, everything is implemented already in Kalpesh's code.
Good luck, it would be cool to compare notes as I am also currently training my inverse paraphraser models.
When I run the paraphrase_many.py get the cuda out of memory error, I am not sure what parameter should to adjust?
@JonOnEarth did you try reducing the batch size using the --batch_size parameter? I haven't tried this but seems like a good starting point as the default is 64.
If batch size 1 doesn't fit, you should try a smaller model like gpt2-medium (it's not too much worse). Gradient checkpointing is also an option, but will need more work. We trained all our models on a 24 GB GPU