princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some questions regarding details of your paper results

shijieli94 opened this issue · comments

Hi, thanks for your outstanding work. I have a few questions regarding the details of your paper. Your insight would be highly valuable.

  1. In your paper, you use the pre-trained RoBERTa model as-is for label generation. I observed that in the code, a lm_head is initialized for converting hidden states into vocabulary space. However, this lm_head does not seem to be included in the pre-trained checkpoint. Could this randomly initialized lm_head potentially lead to the generation of meaningless or wrong labels?

  2. In my experiments using the SST-2 dataset, I achieved similar results to those reported in your paper for prompt-based fine-tuning. However, for zero-shot experiments, my results hovered around guessing levels (50%~55%), which is significantly lower than the results reported in your paper (80%~85%). Are there specific details or considerations in conducting zero-shot experiments that I should be aware of to improve these outcomes?

Looking forward to your response.

Hi, thanks for your interest in our work! For your question:

  1. Can you point out which line of code it is? I believe roberta does have an lm-head (which was tied to the input embedding in pre-training and that's probably why you didn't see it).

  2. We didn't do anything special for zero-shot. Was this random guess result acquired via our code? If so, maybe it is an environment issue. Please try to use the exact same transformers version and see if the problem still persists.

Hi Dr. Gao,

Thanks for getting back to me. It turns out the issue was with the Transformers version I'm using. I have Transformers 4.36, and it seems this version doesn't initialize lm_head properly if I directly use the models.py file from your project. This also appears to be the cause of the second issue, as lm_head isn't set up correctly. After setting up the model correctly, I am now able to replicate the results you mentioned in your paper!

Thanks again for your awesome work!