microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ImportDocumentAsync fails to process large documents with Azure.RequestFailedException: You are sending too many requests. Please try again later.

johnnyreilly opened this issue · comments

Context / Scenario

First of all, thanks for KernelMemory - it's brilliant! We've been building with it and it's been a game changer in terms of the things it makes possible.

In our scenario, we have users that want to chat to very large documents; like 600 page PDFs. We attempt to process such a document like so (C#):

await _memory.ImportDocumentAsync(content: documentContent, fileName: fileName, documentId: documentId, index: index, tags: tags);

What happened?

The ImportDocumentAsync runs for around 10 minutes and then throws an Azure.RequestFailedException with the message "You are sending too many requests. Please try again later.". See stack trace below:

2024-02-01T20:06:55.767571991Z       Error processing document: DocumentToProcess { DocumentUrl = https://stourgptprod.blob.core.windows.net/ourgpt/Big%20Book.pdf, AssistantName = ourgpt, AssistantDisplayName = OurGPT, Description = Group Security assistant, MaintainerEmails = System.String[], AdGroupIds = System.String[], User = UserEmailAndId  }
2024-02-01T20:06:55.767584749Z       System.Exception: Problem chunking document: https://stourgptprod.blob.core.windows.net/ourgpt/Big%20Book.pdf
2024-02-01T20:06:55.767601865Z        ---> Azure.RequestFailedException: You are sending too many requests. Please try again later.
2024-02-01T20:06:55.767607622Z       Status: 503 (Service Unavailable)
2024-02-01T20:06:55.767611742Z       
2024-02-01T20:06:55.767616947Z       Content:
2024-02-01T20:06:55.767621400Z       {"error":{"code":"","message":"You are sending too many requests. Please try again later."}}
2024-02-01T20:06:55.767625940Z       
2024-02-01T20:06:55.767630851Z       Headers:
2024-02-01T20:06:55.767637232Z       Server: Microsoft-IIS/10.0
2024-02-01T20:06:55.767641552Z       Strict-Transport-Security: REDACTED
2024-02-01T20:06:55.767645998Z       Preference-Applied: REDACTED
2024-02-01T20:06:55.767650369Z       throttle-reason: rateLimitExceeded
2024-02-01T20:06:55.767654576Z       client-request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767658825Z       x-ms-client-request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767663107Z       request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767667164Z       elapsed-time: 1744
2024-02-01T20:06:55.767670951Z       Date: Thu, 01 Feb 2024 20:06:55 GMT
2024-02-01T20:06:55.767674544Z       Content-Length: 92
2024-02-01T20:06:55.767678656Z       Content-Type: application/json; charset=utf-8
2024-02-01T20:06:55.767682768Z       Content-Language: REDACTED
2024-02-01T20:06:55.767686873Z       
2024-02-01T20:06:55.767691229Z          at Azure.Search.Documents.IndexesRestClient.ListAsync(String select, CancellationToken cancellationToken)
2024-02-01T20:06:55.767695569Z          at Azure.Search.Documents.Indexes.SearchIndexClient.<>c__DisplayClass43_0.<<GetIndexesAsync>b__0>d.MoveNext()
2024-02-01T20:06:55.767699777Z       --- End of stack trace from previous location ---
2024-02-01T20:06:55.767703927Z          at Azure.Core.PageResponseEnumerator.FuncAsyncPageable`1.AsPages(String continuationToken, Nullable`1 pageSizeHint)+MoveNext()
2024-02-01T20:06:55.767712074Z          at Azure.Core.PageResponseEnumerator.FuncAsyncPageable`1.AsPages(String continuationToken, Nullable`1 pageSizeHint)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
2024-02-01T20:06:55.767715941Z          at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+MoveNext()
2024-02-01T20:06:55.767719575Z          at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+MoveNext()
2024-02-01T20:06:55.767723526Z          at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
2024-02-01T20:06:55.767727593Z          at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.DoesIndexExistAsync(String index, CancellationToken cancellationToken)
2024-02-01T20:06:55.767731851Z          at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.DoesIndexExistAsync(String index, CancellationToken cancellationToken)
2024-02-01T20:06:55.767735961Z          at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.CreateIndexAsync(String index, MemoryDbSchema schema, CancellationToken cancellationToken)
2024-02-01T20:06:55.767739976Z          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveEmbeddingsAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767748878Z          at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767771715Z          at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767776169Z          at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)
2024-02-01T20:06:55.767780546Z          at ZebraGptContainerApp.Services.Implementations.CognitiveSearchService.Store(String index, String documentUrl, String fileName, Stream documentContent, List`1 adGroupIds) in /server/Services/Implementations/CognitiveSearchService.cs:line 146
2024-02-01T20:06:55.767784879Z          at ZebraGptContainerApp.Services.RagGestionService.ChunkDocumentAndStoreInCognitiveSearchIndex(DocumentToProcess documentToProcess) in /server/Services/Implementations/RagGestionService.cs:line 57
2024-02-01T20:06:55.767789267Z          --- End of inner exception stack trace ---
2024-02-01T20:06:55.767796235Z          at ZebraGptContainerApp.Services.RagGestionService.ChunkDocumentAndStoreInCognitiveSearchIndex(DocumentToProcess documentToProcess) in /server/Services/Implementations/RagGestionService.cs:line 101
2024-02-01T20:06:55.767800684Z          at ZebraGptContainerApp.BackgroundServices.DocumentProcessorBackgroundService.PerformRagGestion(IRagGestionService ragGestionService, IDocumentProcessorQueue documentProcessorQueue, CancellationToken stoppingToken) in /server/BackgroundServices/DocumentProcessorBackgroundService.cs:line 50

Ideally there would be a way to facilitate processing a very large document without failing like this. It would be useful if:

  1. the import could be sensitive to the load that the Azure Open AI service can support; perhaps slowing down processing to a level that will not result in 503s
  2. if the import mechanism in some way supported reporting back on what percentage of a document has been processed.

This may be related to #276

Importance

I cannot use Kernel Memory

Platform, Language, Versions

C#

<PackageReference Include="Microsoft.KernelMemory.Core" Version="0.26.240116.2" />

Running in Azure Container Apps / .NET 8

If there's any more information you need then let us know!

Relevant log output

See above

@johnnyreilly thanks for the kind words, glad to hear it's helping. I think your scenario would benefit from using the retry mechanism built into the Memory Service. If understand correctly, you're using the Serverless Memory approach, which does everything in the same process and synchronously. While you can run serverless tasks in the background, the internal orchestrator doesn't have a durable queue to "retry until done".

KM Service helps with that, without making too many changes. First you'd configure the service, similarly to how you configure the serverless memory, using the typical .NET JSON files approach. Then start the service, which provides a web api and internally uses queues to retry. You can secure the web API setting up custom API keys (see the config). The orchestration queues can run on Azure Queues or RabbitMQ (there's also simple queues but they are not meant for production).

Then, change your _memory object, to be an instance of MemoryWebClient, which has the same API of the serverless memory. The main difference, is that operations are asynchronous, that means after sending a document, you would check if the ingestion is complete calling _memory.MemoryWebClient. If you run the service locally and send a big PDF, you'll see hitting the rate limit but retrying until done (which could take minutes, hours or more, depending on your quota).

Does the KM Service able to set the throttle rate when creating embeddings? For example, define how many concurrent api calls per minute.

Hello, I'm using the Kernel Memory service deployed on AKS and I'm also hitting the Azure Search limit, it's a large document.

image

@blinchi @doggy8088 If you configure the service in async mode, with the orchestration based on queues, it should automatically retry and eventually get the job done.

Hello @dluc , I think the issue is that it tries to create the index every time it attempts to add a block to the search index. I'll try to generate the service from your repository to see if that prevents the exception from occurring. Thank you!

image

Nice finding @blinchi
Having same issue here, and I also think the problem is with checking if the index exists on every chunk.

@dluc do you think we can move that client.CreateIndexAsync line just before entering the foreach? I'm happy to do the PR...

Thanks.

Sorry about the delayed reply @dluc

If understand correctly, you're using the Serverless Memory approach, which does everything in the same process and synchronously.

Yes exactly

KM Service helps with that, without making too many changes.

I feel like this is sort of correct but also slightly not. I'll be honest, I really appreciate the simplicity of the serverless model. It's very simple to get it running as a background task in a .NET application using hosted services. We can have a simple app running which does processing in the background with very little complexity involved. We host as a docker container in Azure Container Apps and it just works ™️

If I understand the hosted model, I'll need to spin up a separate container alongside, set up Azure Queues or similar, configure them and work out a communication mechanism between the two containers. Totally doable I'm sure, but a significant change in the app architecture.

It may well be that my hopes are unreasonable around having a way to get serverless to rate limit its requests. Perhaps I should be biting the bullet and doing the migration - I'll have to see how much we need to process big documents before I can justify it to the team I think.

I totally appreciate Kernel Memory - but controversially I think I greatly prefer the serverless model to the hosted. My apologies!

Added a PR for this here: #387

cc @luismanez @dluc

PR #387 merged - this should address the issue. Thanks @spenavajr !

Glad I could help @dluc !