When I finetuned the model, an error occurred during the decoding process: IndexError: Out of range: piece id is out of range.

Question

HypherX opened this issue 10 months ago · comments

Thank you for your amazing work!
I use the generate_batch() function which you provide in another issue, when I run my decode code:

pred_sents = [
    tokenizer.decode(
        g
    )
    for g in pred_ids
]

It seems that there is a problem, I found that it might be an issue with the tokenizer, how can it be solved?