Training script for t5-base
Soistesimmer opened this issue · comments
Hi, thank you for the nice code. It works fine with t5-small.
I also follow the settings for training t5-base in your paper, but the model seems to be not properly trained. The loss when evaluation is much higher than t5-small, and the prediction results are also terrible. I think it is because the hyperparameters I set are still not correct. Can you also provide your script for training on T5-base? Thank you!
This is the script I am using:
CUDA_VISIBLE_DEVICES=0,1 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google/t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=4
--per_device_eval_batch_size=4
--gradient_accumulation_steps 8
--predict_with_generate
--learning_rate 5e-4
--num_train_epochs 2
--text_column="dialogue"
--summary_column="state"
--save_steps=25000
Hi, I think the experiments for T5-base in the paper we're using 4 GPUs.
Can you try the following on a single GPU? It works for me on the first checkpoint saved.
CUDA_VISIBLE_DEVICES=0 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--predict_with_generate
--text_column="dialogue"
--summary_column="state"
--save_steps=50000
Thank you for your suggestion! I will have a try :)
Let me know if you have other questions! Closing this issue for now