princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification re: --num_samples value

nelson-liu opened this issue · comments

Hi!

I enjoyed reading your paper, and thanks for releasing this nice codebase. I had a quick question: In appendix B, it's mentioned that When finetuning with demonstrations, we sample 16 different sets of demonstrations for each input and average the predicted log probability for each class during inference. However, I noticed that QQP, MNLI, and SNLI seem to use --num_sample 4 by default in the run_experiment.sh script (e.g., https://github.com/princeton-nlp/LM-BFF/blob/main/run_experiment.sh#L41 ). If I wanted to faithfully reproduce the results of the paper, should I set num_sample to 16 for these tasks?

Thanks!

Hi,

Indeed for those large datasets we only take --num_sample 4 in the experiments for efficiency issues, for we found that using 16 does not bring significant improvement. To faithfully reproduce the results, you should keep the script unchanged (i.e., take --num_sample 4). Thanks for noticing this and we will add more details in the appendix in our next revision.

Thanks for the prompt response!