Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.

Home Page:https://llamafile.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

If the number of words in the answer exceeds the limit, an infinite error will be reported.

abpyu opened this issue · comments

q
There is enough memory and enough video memory
However, if the number of words in answering a question exceeds a certain number, an infinite error will be reported.
It will not be output after about 200 words, and the cmd window will report infinite errors.
If it’s just a simple question, the answer is fine
For example: Hello, who are you? Please introduce yourself.
Can answer questions correctly

cmd .\llamafile.exe -m Llama-3-8B-Q4-K-M.gguf -ngl 999