feature request: auto-trim chatcontext to respect model context limit.

Question

feature request: auto-trim chatcontext to respect model context limit.

cognitivetech opened this issue 5 months ago · comments

great app btw. really nice simple chat app for the terminal.

one problem I have is even with 32k context, when I have the bot set to generate multi paragraph responses, then I'm hitting the context really quickly within 10 minutes of chat.

what I'd like is somehow to cut out the middle of that chat history, so only the earliest and then most recent messages are added to the context history for inference. then I can chat with the same character without forgetting or losing track of its place.

Thanks!~

Yiorgis Gozadinos · Answer 1 · Sun Jun 02 2024 21:13:47 GMT+0800 (China Standard Time)

Hey there,
Thanks for the praise :)
For the time being oterm does not actually pass the previous messages to the chat completion. What we do instead is store the "context" Ollama returns, which is essentially an embedding of the entire conversation.
I am planning to change that now that the official client supports passing a list of previous messages (it did not when I started oterm). When that happens you will be able to select the number of messages you want to keep passing back and forth.

Yiorgis Gozadinos · Answer 2 · Tue Aug 20 2024 01:37:40 GMT+0800 (China Standard Time)

This is now (from 0.4.0) in place 🎉
oterm now uses the chat API and passes all chat messages. Ollama automatically trims to match the context length.