simonepri / lm-scorer

šŸ“ƒLanguage Model based sentences scoring library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Couple of queries: 1) Fine tuned GPT2 2) BPE Encoding

sb1992 opened this issue Ā· comments

Hi
I had a couple of queries.

  1. I was wondering if you could direct me to the part of the code and recommend changes I could make so that i can also calculate this score on my own fine-tuned gpt2 model (which has its own path where it is saved)

  2. I was also thinking that gpt2 uses BPE encoding. So when you return probability score it always returns the probability for the complete word (not the sub units). As far as i understand BPE it divides the token into sub pieces and gives the corresponding ids to those sub pieces. So do you know how is that working internally, that is able to assign probability to complete word ?

Thanks

  1. If you pass the path to your model as model_name to the GPT2LMScorer class it should work.

  2. Right now we already return the probability of each sub unit.

Thank you