How to decode output

Question

How to decode output

erima2020 opened this issue 7 months ago · comments

erima2020 commented 7 months ago

Hello,
This project seems very interesting. I have a question / suggestion for potential improvement:
The tokenizer has no decode() method (would it make sense to add one?). Could you explain how to get the output back to natural speech?

output = model(**encoded_input)
text_output = ???

Thank you in advance for your reply !

Best wishes,
Eric

pbelcak · Answer 1 · Tue Dec 19 2023 18:36:19 GMT+0800 (China Standard Time)

Hello Eric,

The decoding is up to you -- you can simply go to the logit vectors corresponding to the tokens that were masked out and employ your own decoding strategy, e.g. take the indices of the elements that have the top k largest logit values and find the corresponding tokens through tokenizer decoding.

erima2020 · Answer 2 · Wed Dec 20 2023 18:27:30 GMT+0800 (China Standard Time)

Hello Peter,
Thank you for your reply. If you don't mind, I might contact you about this - tokenizer decode.
Cheers,
Eric