Training script for t5-base

Question

Training script for t5-base

Soistesimmer opened this issue 2 years ago · comments

Hi, thank you for the nice code. It works fine with t5-small.
I also follow the settings for training t5-base in your paper, but the model seems to be not properly trained. The loss when evaluation is much higher than t5-small, and the prediction results are also terrible. I think it is because the hyperparameters I set are still not correct. Can you also provide your script for training on T5-base? Thank you!

This is the script I am using:
CUDA_VISIBLE_DEVICES=0,1 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google/t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=4
--per_device_eval_batch_size=4
--gradient_accumulation_steps 8
--predict_with_generate
--learning_rate 5e-4
--num_train_epochs 2
--text_column="dialogue"
--summary_column="state"
--save_steps=25000

Chia-Hsuan Lee · Answer 1 · Fri Mar 11 2022 01:56:32 GMT+0800 (China Standard Time)

Hi, I think the experiments for T5-base in the paper we're using 4 GPUs.

Can you try the following on a single GPU? It works for me on the first checkpoint saved.

CUDA_VISIBLE_DEVICES=0 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--predict_with_generate
--text_column="dialogue"
--summary_column="state"
--save_steps=50000

Soistesimmer · Answer 2 · Fri Mar 11 2022 11:25:57 GMT+0800 (China Standard Time)

Thank you for your suggestion! I will have a try :)

Chia-Hsuan Lee · Answer 3 · Sat Mar 12 2022 05:00:48 GMT+0800 (China Standard Time)

Let me know if you have other questions! Closing this issue for now