get document list

Question

get document list

xuzeyu91 opened this issue 6 months ago · comments

Context / Scenario

When I was using Kernel Memory, I found that there was no interface to retrieve the document collection based on the index, which made it impossible for me to obtain the corresponding documents when operating the system.

The problem

Add a method to obtain the document collection

Proposed solution

If I store the document collection separately, it can solve the problem, but it seems that this is not very good. I hope to add a method to obtain the document collection

Importance

would be great to have

Maximum Code · Answer 1 · Fri Jan 19 2024 16:07:22 GMT+0800 (China Standard Time)

I am not sure if this will help, but I have the following:

var KernelMemory = new KernelMemoryBuilder()
            .WithOpenAI(new OpenAIConfig() { APIKey = "API_KEY", EmbeddingModel = "Model" })
            .WithSimpleVectorDb(new SimpleVectorDbConfig { Directory = "SimpleDbVectorDirectory", StorageType = FileSystemTypes.Disk })
            .Build<MemoryServerless>();
var memories = await KernelMemory.ListIndexesAsync();
var memoryDbs = KernelMemory.Orchestrator.GetMemoryDbs();

foreach (var memoryIndex in memories)
{
      foreach (var memoryDb in memoryDbs)
      {
             var list = memoryDb.GetListAsync(memoryIndex.Name, null, 100, true);

             await foreach (var item in list)
             {
                    System.Console.WriteLine($"Index: {memoryIndex.Name} - {item.Id}");
             }
      }
}

It would be nice to have a list of documents added. I generally use tags to have that info.

zyxucp · Answer 2 · Wed Jan 24 2024 16:42:03 GMT+0800 (China Standard Time)

I am not sure if this will help, but I have the following:

var KernelMemory = new KernelMemoryBuilder()
            .WithOpenAI(new OpenAIConfig() { APIKey = "API_KEY", EmbeddingModel = "Model" })
            .WithSimpleVectorDb(new SimpleVectorDbConfig { Directory = "SimpleDbVectorDirectory", StorageType = FileSystemTypes.Disk })
            .Build<MemoryServerless>();
var memories = await KernelMemory.ListIndexesAsync();
var memoryDbs = KernelMemory.Orchestrator.GetMemoryDbs();

foreach (var memoryIndex in memories)
{
      foreach (var memoryDb in memoryDbs)
      {
             var list = memoryDb.GetListAsync(memoryIndex.Name, null, 100, true);

             await foreach (var item in list)
             {
                    System.Console.WriteLine($"Index: {memoryIndex.Name} - {item.Id}");
             }
      }
}

It would be nice to have a list of documents added. I generally use tags to have that info.

Thank you very much. I have obtained detailed information about document slicing through this method. It would be even better if it could be encapsulated into a function!

zyxucp · Answer 3 · Thu Feb 22 2024 22:46:56 GMT+0800 (China Standard Time)

  var memories = await _memory.ListIndexesAsync();
  var memoryDbs = _memory.Orchestrator.GetMemoryDbs();

  foreach (var memoryIndex in memories)
  {
      foreach (var memoryDb in memoryDbs)
      {
          var item = await memoryDb.GetListAsync(memoryIndex.Name, new List<MemoryFilter>() { new MemoryFilter().ByDocument(fileid) }, 100, true).FirstOrDefaultAsync();
      }
  }

At present, this way, the doc can be obtained based on the docid, and if the limit parameter is<=0, it is int Max

if(limit<=0){limit = int.MaxValue;}

This seems to be unable to achieve true pagination, and there may be performance issues if the data volume is too large