Chia-Hsuan-Lee / DST-as-Prompting

Source code for Dialogue State Tracking with a Language Model using Schema-Driven Prompting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training script for t5-base

Soistesimmer opened this issue · comments

Hi, thank you for the nice code. It works fine with t5-small.
I also follow the settings for training t5-base in your paper, but the model seems to be not properly trained. The loss when evaluation is much higher than t5-small, and the prediction results are also terrible. I think it is because the hyperparameters I set are still not correct. Can you also provide your script for training on T5-base? Thank you!

This is the script I am using:
CUDA_VISIBLE_DEVICES=0,1 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google/t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=4
--per_device_eval_batch_size=4
--gradient_accumulation_steps 8
--predict_with_generate
--learning_rate 5e-4
--num_train_epochs 2
--text_column="dialogue"
--summary_column="state"
--save_steps=25000

Hi, I think the experiments for T5-base in the paper we're using 4 GPUs.

Can you try the following on a single GPU? It works for me on the first checkpoint saved.

CUDA_VISIBLE_DEVICES=0 python examples/pytorch/summarization/run_summarization.py
--model_name_or_path t5-base
--do_train
--do_predict
--train_file "$DATA_DIR/train.json"
--validation_file "$DATA_DIR/dev.json"
--test_file "$DATA_DIR/test.json"
--source_prefix ""
--output_dir "$OUTPUT_DIR/t5-base-mwoz2.2"
--per_device_train_batch_size=2
--per_device_eval_batch_size=2
--predict_with_generate
--text_column="dialogue"
--summary_column="state"
--save_steps=50000

Thank you for your suggestion! I will have a try :)

Let me know if you have other questions! Closing this issue for now