princeton-nlp / LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the regression problem

Elenore1997 opened this issue · comments

Hi, I have a question about the method of regression. In section 4.2 in your paper, why use kl-divergence loss between p(yu | xin) and the scaled score (y−vl)/(vu−vl) (loss = loss_fct(logits.view(-1, 2), labels)), but not cross entropy loss? The logits and labels here are both probability distribution on the 2 polarities.
Thanks in advance!

Thanks for the question. It should be possible to use either one.

Thanks for the question. It should be possible to use either one.

Thanks for the quick reply!