microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

get document list

xuzeyu91 opened this issue · comments

Context / Scenario

When I was using Kernel Memory, I found that there was no interface to retrieve the document collection based on the index, which made it impossible for me to obtain the corresponding documents when operating the system.

The problem

Add a method to obtain the document collection

Proposed solution

If I store the document collection separately, it can solve the problem, but it seems that this is not very good. I hope to add a method to obtain the document collection

Importance

would be great to have

I am not sure if this will help, but I have the following:

var KernelMemory = new KernelMemoryBuilder()
            .WithOpenAI(new OpenAIConfig() { APIKey = "API_KEY", EmbeddingModel = "Model" })
            .WithSimpleVectorDb(new SimpleVectorDbConfig { Directory = "SimpleDbVectorDirectory", StorageType = FileSystemTypes.Disk })
            .Build<MemoryServerless>();
var memories = await KernelMemory.ListIndexesAsync();
var memoryDbs = KernelMemory.Orchestrator.GetMemoryDbs();

foreach (var memoryIndex in memories)
{
      foreach (var memoryDb in memoryDbs)
      {
             var list = memoryDb.GetListAsync(memoryIndex.Name, null, 100, true);

             await foreach (var item in list)
             {
                    System.Console.WriteLine($"Index: {memoryIndex.Name} - {item.Id}");
             }
      }
}

It would be nice to have a list of documents added. I generally use tags to have that info.

I am not sure if this will help, but I have the following:

var KernelMemory = new KernelMemoryBuilder()
            .WithOpenAI(new OpenAIConfig() { APIKey = "API_KEY", EmbeddingModel = "Model" })
            .WithSimpleVectorDb(new SimpleVectorDbConfig { Directory = "SimpleDbVectorDirectory", StorageType = FileSystemTypes.Disk })
            .Build<MemoryServerless>();
var memories = await KernelMemory.ListIndexesAsync();
var memoryDbs = KernelMemory.Orchestrator.GetMemoryDbs();

foreach (var memoryIndex in memories)
{
      foreach (var memoryDb in memoryDbs)
      {
             var list = memoryDb.GetListAsync(memoryIndex.Name, null, 100, true);

             await foreach (var item in list)
             {
                    System.Console.WriteLine($"Index: {memoryIndex.Name} - {item.Id}");
             }
      }
}

It would be nice to have a list of documents added. I generally use tags to have that info.

Thank you very much. I have obtained detailed information about document slicing through this method. It would be even better if it could be encapsulated into a function!

  var memories = await _memory.ListIndexesAsync();
  var memoryDbs = _memory.Orchestrator.GetMemoryDbs();

  foreach (var memoryIndex in memories)
  {
      foreach (var memoryDb in memoryDbs)
      {
          var item = await memoryDb.GetListAsync(memoryIndex.Name, new List<MemoryFilter>() { new MemoryFilter().ByDocument(fileid) }, 100, true).FirstOrDefaultAsync();
      }
  }

At present, this way, the doc can be obtained based on the docid, and if the limit parameter is<=0, it is int Max

if(limit<=0){limit = int.MaxValue;}

This seems to be unable to achieve true pagination, and there may be performance issues if the data volume is too large