Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

When I finetuned the model, an error occurred during the decoding process: IndexError: Out of range: piece id is out of range.

HypherX opened this issue · comments

Thank you for your amazing work!
I use the generate_batch() function which you provide in another issue, when I run my decode code:

pred_sents = [
    tokenizer.decode(
        g
    )
    for g in pred_ids
]

It seems that there is a problem, I found that it might be an issue with the tokenizer, how can it be solved?