Question about training processing

Question

Question about training processing

thunderboom opened this issue 4 years ago · comments

Hi
I run the code in the Fewrel dataset. I user maml+cnn as the model and bert as the embedding according to Tabel2 in the paper.During the training, the model was soon overfitted and the best epoch for 5way1shot and 5way5shot are 9 and 14.
I wonder if it's normal.
I use the "bert-base-uncased" type of bert and do not finetune it in meta-training.
The dev acc: 5way1shot:0.5114 ± 0.0932; 5way5shot:0.6348 ± 0.0563

I use the following parameter:

python src/main.py
--cuda 0
--way 5
--shot 1
--query 25
--mode 'train'
--embedding 'cnn'
--classifier 'mlp'
--dataset=$dataset
--data_path=$data_path
--n_train_class=$n_train_class
--n_val_class=$n_val_class
--n_test_class=$n_test_class
--pretrained_bert=$pretrained_bert
--bert_cache_dir=$pretrained_bert
--maml
--maml_firstorder
--maml_innersteps 10
--bert
--notqdm
--maml_stepsize 0.01 \

Thank u

thunderboom commented 4 years ago

Thank u

Yujia Bao · Answer 1 · Wed Dec 02 2020 11:33:45 GMT+0800 (China Standard Time)

Yes, it is normal. BERT + CNN is a very powerful model built on top of the input words. It is able to overfit the train tasks very easily. However, the learned representation may not be useful for unseen val tasks (this is why the dev acc is very low).