Clarification re: --num_samples value
nelson-liu opened this issue · comments
Hi!
I enjoyed reading your paper, and thanks for releasing this nice codebase. I had a quick question: In appendix B, it's mentioned that When finetuning with demonstrations, we sample 16 different sets of demonstrations for each input and average the predicted log probability for each class during inference
. However, I noticed that QQP, MNLI, and SNLI seem to use --num_sample 4
by default in the run_experiment.sh
script (e.g., https://github.com/princeton-nlp/LM-BFF/blob/main/run_experiment.sh#L41 ). If I wanted to faithfully reproduce the results of the paper, should I set num_sample
to 16 for these tasks?
Thanks!
Hi,
Indeed for those large datasets we only take --num_sample 4
in the experiments for efficiency issues, for we found that using 16 does not bring significant improvement. To faithfully reproduce the results, you should keep the script unchanged (i.e., take --num_sample 4
). Thanks for noticing this and we will add more details in the appendix in our next revision.
Thanks for the prompt response!