princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about prompt-based finetuning and automatic selection of label words

pzzhang opened this issue · comments

In the paper, it mentions "Let M: Y → V be a mapping from the task label space to individual words in the vocabulary V of L." Here, V is the set of "individual words" or "individual sub-words"?

I noticed that many auto-generated label words, such as "unforgettable/extraordinary/good/better/terrible" in SST-5 (Table E.1), are very long and should not be a single sub-word (from the view of a Roberta tokenizer). Then it seems that each label may contain multiple sub-words. In this case, the following sentence is confusing:
"Then for each xin, let the manipulation xprompt = T (xin) be a masked language modeling (MLM) input which contains one [MASK] token."
I'm not sure how one [MASK] token can reconstruct multiple tokens (sub-words), like "unforgettable".

This issue is also related to the automatic selection of label words, to determine whether we are searching over all the sub-words or all the words.

Could the authors clarify this detail?

All of those mentioned label words are indeed words in the Roberta vocabulary: https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json

Thank @ajfisch for the quick reply! I got surprised that it's a single token for such a long word.

Anyway, do you have ways to handle words that maybe spited into multiple tokens?

@pzzhang It could be an interesting idea but we didn't investigate it in the paper. For the automatic label search part, we explicitly only enumerate tokens in the vocabulary for each label. Using multiple tokens to represent labels may lead to an imprecise estimate of probabilities I assume.