LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

Home Page:https://github.com/lostruins/koboldcpp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong tokens / second

EugeoSynthesisThirtyTwo opened this issue · comments

It says

Processing Prompt [BLAS] (1676 / 1676 tokens)
Generating (78 / 387 tokens)
(EOS token triggered!)
(Special Stop Token Triggered! ID:128009)
CtxLimit: 1754/8192, Process:25.05s (14.9ms/T = 66.89T/s), Generate:59.05s (152.6ms/T = 6.55T/s), Total:84.11s (4.60T/s)

But 6.55T/s is the speed that would have been achieved if the model generated 387 tokens. The model actually generated only 78 tokens, so the real generation speed is 78 / 59.05 = 1.32 tokens / s

Will try to fix

Can you see if the latest version solves this issue?

Can you see if the latest version solves this issue?

It's good thank you
image

However, as you can see, if I abort the generation, there is a new log "Generating (301 / 300 tokens)" which is wrong. I don't know if it's related. Let me know if I should open a new issue for this.

Don't worry about that, probably just a minor thing.