YujiaBao / Distributional-Signatures

"Few-shot Text Classification with Distributional Signatures" ICLR 2020

Home Page:https://arxiv.org/abs/1908.06039

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about training processing

thunderboom opened this issue · comments

Hi
I run the code in the Fewrel dataset. I user maml+cnn as the model and bert as the embedding according to Tabel2 in the paper.During the training, the model was soon overfitted and the best epoch for 5way1shot and 5way5shot are 9 and 14.
I wonder if it's normal.
I use the "bert-base-uncased" type of bert and do not finetune it in meta-training.
The dev acc: 5way1shot:0.5114 ± 0.0932; 5way5shot:0.6348 ± 0.0563

I use the following parameter:

python src/main.py
--cuda 0
--way 5
--shot 1
--query 25
--mode 'train'
--embedding 'cnn'
--classifier 'mlp'
--dataset=$dataset
--data_path=$data_path
--n_train_class=$n_train_class
--n_val_class=$n_val_class
--n_test_class=$n_test_class
--pretrained_bert=$pretrained_bert
--bert_cache_dir=$pretrained_bert
--maml
--maml_firstorder
--maml_innersteps 10
--bert
--notqdm
--maml_stepsize 0.01 \

Thank u

Yes, it is normal. BERT + CNN is a very powerful model built on top of the input words. It is able to overfit the train tasks very easily. However, the learned representation may not be useful for unseen val tasks (this is why the dev acc is very low).

Thank u