Question about training processing
thunderboom opened this issue · comments
Hi
I run the code in the Fewrel dataset. I user maml+cnn as the model and bert as the embedding according to Tabel2 in the paper.During the training, the model was soon overfitted and the best epoch for 5way1shot and 5way5shot are 9 and 14.
I wonder if it's normal.
I use the "bert-base-uncased" type of bert and do not finetune it in meta-training.
The dev acc: 5way1shot:0.5114 ± 0.0932; 5way5shot:0.6348 ± 0.0563
I use the following parameter:
python src/main.py
--cuda 0
--way 5
--shot 1
--query 25
--mode 'train'
--embedding 'cnn'
--classifier 'mlp'
--dataset=$dataset
--data_path=$data_path
--n_train_class=$n_train_class
--n_val_class=$n_val_class
--n_test_class=$n_test_class
--pretrained_bert=$pretrained_bert
--bert_cache_dir=$pretrained_bert
--maml
--maml_firstorder
--maml_innersteps 10
--bert
--notqdm
--maml_stepsize 0.01 \
Thank u
Yes, it is normal. BERT + CNN is a very powerful model built on top of the input words. It is able to overfit the train tasks very easily. However, the learned representation may not be useful for unseen val tasks (this is why the dev acc is very low).
Thank u