Persisting KernelMemory while using MemoryServerless across application restarts

Question

Persisting KernelMemory while using MemoryServerless across application restarts

SwagataChaudhuri opened this issue 10 months ago · comments

Swagata Chaudhuri commented 10 months ago

I have referred to the following example
https://github.com/microsoft/kernel-memory/tree/main/examples/003-dotnet-Serverless
and ingested few documents in my application is there any way I can persist the memory so that I don not need to re-ingest the documents upon every restarts, please guide me on the same / any documentation if any?

I want my app to persist the documents ingested so that I don't re-ingest them

Devis Lucato · Answer 1 · Thu Dec 14 2023 04:18:06 GMT+0800 (China Standard Time)

hi @SwagataChaudhuri, by default the memory doesn't persist data just for demos, but it's pretty simple. You can tweak the configuration and/or choose dependencies that persist the data.

There are 3 main dependencies to consider:

Queues
Files
Vectors

By default the code uses:

SimpleQueues, with Volatile storage
SimpleStorage, with Volatile storage
SimpleVectorDb, with Volatile storage

You can change the "Simple*" configurations to use "Disk" storage, e.g. editing the configuration files. Or you can select different dependencies, e.g.

Queues: Azure Storage Queues or RabbitMQ
Files: Azure Blobs or Local file system
Vectors: Azure AI Search or Qdrant. ElasticSearch will also soon be available.

Devis Lucato · Answer 2 · Thu Dec 14 2023 04:31:24 GMT+0800 (China Standard Time)

Here's an example for demos with persistent data:

var memory = new KernelMemoryBuilder()
    .WithSimpleFileStorage(new SimpleFileStorageConfig { StorageType = FileSystemTypes.Disk })
    .WithSimpleVectorDb(new SimpleVectorDbConfig {StorageType = FileSystemTypes.Disk })
    .WithAzureOpenAITextGeneration(azureOpenAITextConfig, new DefaultGPTTokenizer())
    .WithAzureOpenAITextEmbeddingGeneration(azureOpenAIEmbeddingConfig, new DefaultGPTTokenizer())
    .Build<MemoryServerless>();

and one with reliable dependencies in the cloud:

var memory = new KernelMemoryBuilder()
    .WithAzureBlobsStorage(new AzureBlobsConfig
    {
        Auth = ...,
        ...,
    })
    .WithAzureAISearch(new AzureAISearchConfig
    {
        Auth = ...,
        Endpoint = ...,
        ...,
    })
    .WithAzurequeuePipeline(new AzureQueueConfig
    {
        Auth = ...,
        ...,
    })
    .WithAzureOpenAITextGeneration(azureOpenAITextConfig, new DefaultGPTTokenizer())
    .WithAzureOpenAITextEmbeddingGeneration(azureOpenAIEmbeddingConfig, new DefaultGPTTokenizer())
    .Build<MemoryServerless>();

Then, for a faster and more scalable approach, we recommend deploying the service, and using MemoryWebClient, e.g. see example 001.

Swagata Chaudhuri · Answer 3 · Fri Dec 15 2023 10:19:27 GMT+0800 (China Standard Time)

Hi @dluc , I tried using the below configuration as mentioned above

.WithAzureBlobsStorage(new AzureBlobsConfig
    {
        Auth = ...,
        ...,
    })

and then I see that after when I do document ingestion as below:

  await memory.ImportDocumentAsync(new Document("doc1")
  .AddFiles(
      [
          "File1",
          "File1"
      ])
  .AddTag("label", "docfiles"));

the same is getting stored as extractions and partitions and the original file in azure blob storage in the designated container

but when I restart the system and comment out the document ingestion and try to query directly

I get the below error:

warn: Microsoft.KernelMemory.Search.SearchClient[0]
      No memories available
INFO NOT FOUND

Need some pointers on how to retrieve the memory from the documents already ingested and pushed to azure blob storage, and samples or snippets would be of great help !!

Devis Lucato (#2) · Answer 4 · Fri Dec 15 2023 11:37:00 GMT+0800 (China Standard Time)

did you use also this part?

.WithAzureAISearch(new AzureAISearchConfig
    {
        Auth = ...,
        Endpoint = ...,
        ...,
    })

that's required to persist the memory indexes (vector storage)

Swagata Chaudhuri · Answer 5 · Fri Dec 15 2023 13:44:13 GMT+0800 (China Standard Time)

Thanks a ton @dluc it works perfectly fine now !!

Can I submit a sample with a .NET 8 minimal api to ingest documents and then query them?

Let me know if you feel that will benefit others then will submit a PR

Devis Lucato · Answer 6 · Sat Dec 16 2023 06:10:33 GMT+0800 (China Standard Time)

@SwagataChaudhuri yes that would be great. There's an examples folder if you want to put the sample here. I would clone example 106, call it "107-something" and add the code there. If you could please include a README.md to explain the setup and how it works. Thanks!!

Swagata Chaudhuri · Answer 7 · Sat Dec 16 2023 09:39:34 GMT+0800 (China Standard Time)

Sure I would !! Thanks a lot @dluc