pbelcak / UltraFastBERT

The repository for the code of the UltraFastBERT paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to decode output

erima2020 opened this issue · comments

Hello,
This project seems very interesting. I have a question / suggestion for potential improvement:
The tokenizer has no decode() method (would it make sense to add one?). Could you explain how to get the output back to natural speech?

output = model(**encoded_input)
text_output = ???

Thank you in advance for your reply !

Best wishes,
Eric

Hello Eric,

The decoding is up to you -- you can simply go to the logit vectors corresponding to the tokens that were masked out and employ your own decoding strategy, e.g. take the indices of the elements that have the top k largest logit values and find the corresponding tokens through tokenizer decoding.

Hello Peter,
Thank you for your reply. If you don't mind, I might contact you about this - tokenizer decode.
Cheers,
Eric