ggozad / oterm

a text-based terminal client for Ollama

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feature request: auto-trim chatcontext to respect model context limit.

cognitivetech opened this issue · comments

great app btw. really nice simple chat app for the terminal.

one problem I have is even with 32k context, when I have the bot set to generate multi paragraph responses, then I'm hitting the context really quickly within 10 minutes of chat.

what I'd like is somehow to cut out the middle of that chat history, so only the earliest and then most recent messages are added to the context history for inference. then I can chat with the same character without forgetting or losing track of its place.

Thanks!~

Hey there,
Thanks for the praise :)
For the time being oterm does not actually pass the previous messages to the chat completion. What we do instead is store the "context" Ollama returns, which is essentially an embedding of the entire conversation.
I am planning to change that now that the official client supports passing a list of previous messages (it did not when I started oterm). When that happens you will be able to select the number of messages you want to keep passing back and forth.

This is now (from 0.4.0) in place 🎉
oterm now uses the chat API and passes all chat messages. Ollama automatically trims to match the context length.