LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI

Home Page:https://github.com/lostruins/koboldcpp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reprocessing Issue with Llama 3

Nabokov86 opened this issue · comments

When using Llama 3, I've noticed that unnecessary reprocessing occurs on previously generated text.
To reproduce this issue, try generating a short piece of text couple of times and see how the processing sometimes happens.

Latest concedo_experimental.

It seems like the reprocessing occurs after a new line is generated.
Screenshot from 2024-04-23

Did you by any chance enable "Trim Sentences" or "Author Note"?

No, I use default settings without trimming. So, you can't reproduce it?
saved_story.json

Yes, I can reproduce it. Looking closer, the tokenizer is behaving weirdly. I think there is an issue with token merges.

Relevant: ggerganov#6809

You should experience a small amount of reprocessing all the way back to the previous newline. This is a bug.

Hi, Should be fixed in the latest version. Remember to get freshly reconverted GGUFs

@LostRuins Thanks! Yes, it looks like it’s working now. Thank you for continuing to maintain this project, you’re awesome!