Question about the calculation of self-information

Question

Question about the calculation of self-information

ChileShum opened this issue 6 months ago · comments

Thank you for such interesting work!
My question is why you chose a causal language model such as GPT instead of a masked language model such as Bert when calculating the token's self-information. Is it feasible to use Bert to calculate the self-information of tokens?
I would be grateful if you could answer my questions.

liyucheng09 · Answer 1 · Thu Jan 25 2024 21:29:40 GMT+0800 (China Standard Time)

Never thought about BERT can be a valid choice. But I think using BERT can be an interesting attempt.

You need a not-too-bad language model for self-information computing, so choose a more advanced mased LM is better. RoBERTa sounds promising.

Zhili Shen · Answer 2 · Thu Jan 25 2024 21:46:18 GMT+0800 (China Standard Time)

Thank you very much for your answer! I will try it.