microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TextGenerationOptions is totally not used

AsakusaRinne opened this issue · comments

TextGenerationOptions is a parameter of ITextGeneration.GenerateTextAsync. However currently it seems to be not used anywhere.

For some API service like OpenAI Chatgpt, stop sequence is not so important. However for local model inference, the model will endlessly generate response without a stop sequence.

Could you please expose TextGenerationOptions to AskAsync API to let users configure the settings themselves? It will help a lot for local LLM inference integration.

If possible, I hope that the method of calculating the number of tokens can also provide custom configuration.

Any updates? @dluc I understand that at the beginning stage of a project, there's always short of hands. Please at least let us know if it would be solved in the future.

Sorry we didn't have an opportunity to look into this yet, but we always keep an eye on the list of open issues, so we'll provide an update as soon as possible.

Ok, I'm looking forward to it. Thank you for your works anyway.

I noticed that LLama would generate tokens ad infinitum (almost, at some point it throws an exception). SearchClientConfig.AnswerTokens will be passed as TextGenerationOptions.MaxTokens.

I'll look into adding the options to the Ask API, so the behavior can be managed more easily.

Thank you a lot! I'm looking forward to it