psugihara / FreeChat

llama.cpp based AI chat app for macOS

Home Page:https://www.freechat.run

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Responses are truncated

cleesmith opened this issue · comments

I've attached a screen capture of responses being truncated.

Also, an image of my Settings; just in case, and I am trying the:
prometheus-13b-v1.0.Q5_K_M.gguf which seems similar to GPT-4 (sort of).

Changing the System Prompt did not seem to have any affect.

While less often, the truncation did happen with the default model too.

Thoughts?

FreeChat_truncated_responses

FreeChat_Settings

Hm, my guess here is that it's an issue with Prompt format. It looks like this model is trained on a very unusual prompt format meant more for using programmatically than as a chat model: https://huggingface.co/kaist-ai/prometheus-13b-v1.0#prompt-format

I'd recommend trying MythoMist, Openhermes 2.5 or any of the top ranked ones here: https://www.otherbrain.world/?columnFilters=%5B%7B%22id%22%3A%22numParameters%22%2C%22value%22%3A%5B1%2C3%2C6%2C7%2C11%2C12%2C13%5D%7D%5D

There is an update in main that will make it into the app store soon that fixes an issue with longer chats and may fix the truncation you're hitting occasionally with the default model.

Thanks I will keep an eye out, and try one of the other models.

Sounds good. I appreciate all the candid feedback.

Thanks again, switching to: openhermes-2.5-neural-chat-7b-v3-1-7b.Q5_K_M.gguf on my old iMac 3.3GHz 6-core Intel i5 with 16GB, so now it does work, slowly, but it's time to upgrade to a M3 perhaps with more memory (if that helps). So far your app works well, and look forward to the updates.

Beautiful. Yes, apple silicon is incredible for this stuff and I definitely recommend as much RAM as you can reasonably afford. With multi-modal models and breakthroughs at the 70B level I'm wishing I had more too :)