liyucheng09 / Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the calculation of self-information

ChileShum opened this issue · comments

Thank you for such interesting work!
My question is why you chose a causal language model such as GPT instead of a masked language model such as Bert when calculating the token's self-information. Is it feasible to use Bert to calculate the self-information of tokens?
I would be grateful if you could answer my questions.

Never thought about BERT can be a valid choice. But I think using BERT can be an interesting attempt.

You need a not-too-bad language model for self-information computing, so choose a more advanced mased LM is better. RoBERTa sounds promising.

Thank you very much for your answer! I will try it.