li2109 / langtorch

🔥 Building composable LLM applications & workflow with Java.

Home Page:https://knowly-ai.gitbook.io/langtorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add token length limitation to Conversation Memory class

li2109 opened this issue · comments

Currently, the Conversation Memory Class can store an arbitrary number of strings with no maximum length limitation.

However, the language model has a fixed prompt length.

Feature Requirements:
Add a parameter for the maximum token length in the Memory Class. If not specified, the Memory Class should default to the current behavior with no maximum string length.

If a string is added to the Memory Class that exceeds this maximum length, the Memory Class should truncate the messages to fit within the maximum length. [remove the oldest messages]

There are some potential issues that need to be addressed:

  1. Prompt length from LLM is measured in token length, not string length(we do have a ai.knowly.langtorch.llm.openai.tokenization.OpenAITokenizer) that can provide token length for open ai models, but what should we do for other models? Make an upper-bound estimation?

  2. is there a way to design a generic strategy of truncating messages whenever the limit is met?

commented

@li2109 Prompt lengths don't seem to be consistent across LLMs

@li2109 Prompt lengths don't seem to be consistent across LLMs

Yes, they are not the same. There will be a parameter you can set to adjust

commented

Yes, they are not the same. There will be a parameter you can set to adjust

To look into whether the different LLMs provide APIs for token lengths, or do they use magic values in each LLM model configuration?

Yes, they are not the same. There will be a parameter you can set to adjust

To look into whether the different LLMs provide APIs for token lengths, or do they use magic values in each LLM model configuration?

For 1) i don't think its guaranteed all LLM providers will have an API specifically for this?
For 2) there's no magic number I guess or you are referring to some default minimal limit?

I think neither would work. manual input of prompt length probably is the best option here.
Let's say the prompt length is 100 tokens; some people would prefer leaving 50 tokens for history; some might think 80 is better. Length of current user input also plays a role here.