Question about the calculation of self-information
ChileShum opened this issue · comments
Thank you for such interesting work!
My question is why you chose a causal language model such as GPT instead of a masked language model such as Bert when calculating the token's self-information. Is it feasible to use Bert to calculate the self-information of tokens?
I would be grateful if you could answer my questions.
Never thought about BERT can be a valid choice. But I think using BERT can be an interesting attempt.
You need a not-too-bad language model for self-information computing, so choose a more advanced mased LM is better. RoBERTa sounds promising.
Thank you very much for your answer! I will try it.