microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] Using semantic kernel plug-in with parameters?

FreBon opened this issue · comments

Context / Scenario

Hi,

I'm using Semantic Kernel with Azure Open AI services to build a chat solution. This works great, but now I want to add Memory to handle when users paste URLS in the prompts, I'm extracting the URLs and I add them to the KernelMemory. I'm using it in serverless mode so i use the MemoryServerless instance that i get from the KernelMemoryBuilder to import the URLs.

This is how the memory builder looks:

        var kernelMemory = new KernelMemoryBuilder()
            .WithAzureOpenAITextGeneration(chatConfig)
            .WithAzureOpenAITextEmbeddingGeneration(embeddingConfig)
            .WithAzureAISearchMemoryDb(memorySettings.AzureAiSearchUri, memorySettings.AzureAiSearchKey)
            .WithSearchClientConfig(new SearchClientConfig {MaxAskPromptSize = aiSettings.MaxTokenLength, AnswerTokens = 4096})
            .Build<MemoryServerless>();

I also add the momory plug-in:

    var memoryPlugin = new MemoryPlugin(kernelMemory);
    kernel.ImportPluginFromObject(memoryPlugin, "memory");

When the user sends a prompt I'm using the Memory Plugin with a prompt template that looks like this:

 var prompt = $@"
        User Question: {question}

        Kernel Memory Answer: {{memory.ask}}

        Respond in a well formatted Markdown syntax";

Question

  1. Without any filters and specified index, it seems like the prompt is not asking the memory, How do I know if the memory is used? Any logging?
  2. I want to add tags to the pages I add to have them filtered by conversation, I got this to work using the MemoryServerless instance, but how do I pass the filter to the plug-in?
  3. What is the recommended way to use Kernel Memory with Semantic kernel? Got the solution to work by using MemoryServerless.AskAsync but there is no streaming version so the user experience is not the best ..

Greatfull for any guidence :)

hi @FreBon, I'm not sure which engine is interpreting the prompt you included above, but from the prompt I noticed a couple of problems:

  1. parameters are not being passed to the "ask" function. e.g. if you want the prompt to include the answer generated by KM, it should be something like Kernel Memory Answer: {{memory.ask $question}} (the syntax depends on the template engine you're using). Also there might be other parameters you'd want to pass, like the index name and tags: again, how you pass those parameters depends on your template engine, e.g. the information could be inline or in the context.
  2. It looks like you're passing a prompt to transform KM answer into a different format, e.g. Markdown. This prompt should not be passed to KM, but directly to the LLM.

Hi @dluc, thanks for the response :)

I don't know what template engine I'm using, but to test this I created a Console app and updated my prompt template to have send the user input as a parameter to ask. Then it seems that the as function is triggered thru the plug-in. But it was a really strange result ..

This is how the code looks in the console application:
image

And this was the result I got in the logs:
image

I don't know how to handle the parameters to the template when using the Chat Completion service from Semantic Kernel, haven't found any way to pass that to the "GetStreamingChatMessageContentsAsync" it just takes the PromptExecutionSettings object not the KernelArgument (that has the PromptExecutionSettings as a property) as the function InvokePromptAsync on the Kernel does. How do i pass the parameters to the ChatCompleationService?

And when it comes to the MarkDown format, that should not be sent to the KernelMemory, thats for the Semantic Kernel. I want to get the chat solution to produces nice looking responses .. But i guess i could remove it form the template and add it as a system message.

@FreBon Hey, were you able to figure this out? I am trying to solve this problem right now.

Hello, we need to be able to constrain the Kernel Memory to specific index/document/tag. Please advise how this can be accomplished through SK, as demostrated by the OP above. Thanks

Hello, we need to be able to constrain the Kernel Memory to specific index/document/tag. Please advise how this can be accomplished through SK, as demostrated by the OP above. Thanks

It's recommended to use KM from a backend service, without direct calls coming from users, so you should have complete control of the requests sent to KM, including which indexes and tags to use.