ImportDocumentAsync fails to process large documents with Azure.RequestFailedException: You are sending too many requests. Please try again later.
johnnyreilly opened this issue · comments
Context / Scenario
First of all, thanks for KernelMemory - it's brilliant! We've been building with it and it's been a game changer in terms of the things it makes possible.
In our scenario, we have users that want to chat to very large documents; like 600 page PDFs. We attempt to process such a document like so (C#):
await _memory.ImportDocumentAsync(content: documentContent, fileName: fileName, documentId: documentId, index: index, tags: tags);
What happened?
The ImportDocumentAsync
runs for around 10 minutes and then throws an Azure.RequestFailedException
with the message "You are sending too many requests. Please try again later.". See stack trace below:
2024-02-01T20:06:55.767571991Z Error processing document: DocumentToProcess { DocumentUrl = https://stourgptprod.blob.core.windows.net/ourgpt/Big%20Book.pdf, AssistantName = ourgpt, AssistantDisplayName = OurGPT, Description = Group Security assistant, MaintainerEmails = System.String[], AdGroupIds = System.String[], User = UserEmailAndId }
2024-02-01T20:06:55.767584749Z System.Exception: Problem chunking document: https://stourgptprod.blob.core.windows.net/ourgpt/Big%20Book.pdf
2024-02-01T20:06:55.767601865Z ---> Azure.RequestFailedException: You are sending too many requests. Please try again later.
2024-02-01T20:06:55.767607622Z Status: 503 (Service Unavailable)
2024-02-01T20:06:55.767611742Z
2024-02-01T20:06:55.767616947Z Content:
2024-02-01T20:06:55.767621400Z {"error":{"code":"","message":"You are sending too many requests. Please try again later."}}
2024-02-01T20:06:55.767625940Z
2024-02-01T20:06:55.767630851Z Headers:
2024-02-01T20:06:55.767637232Z Server: Microsoft-IIS/10.0
2024-02-01T20:06:55.767641552Z Strict-Transport-Security: REDACTED
2024-02-01T20:06:55.767645998Z Preference-Applied: REDACTED
2024-02-01T20:06:55.767650369Z throttle-reason: rateLimitExceeded
2024-02-01T20:06:55.767654576Z client-request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767658825Z x-ms-client-request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767663107Z request-id: f668f002-39c1-42ef-aca3-e47b7feb4cc6
2024-02-01T20:06:55.767667164Z elapsed-time: 1744
2024-02-01T20:06:55.767670951Z Date: Thu, 01 Feb 2024 20:06:55 GMT
2024-02-01T20:06:55.767674544Z Content-Length: 92
2024-02-01T20:06:55.767678656Z Content-Type: application/json; charset=utf-8
2024-02-01T20:06:55.767682768Z Content-Language: REDACTED
2024-02-01T20:06:55.767686873Z
2024-02-01T20:06:55.767691229Z at Azure.Search.Documents.IndexesRestClient.ListAsync(String select, CancellationToken cancellationToken)
2024-02-01T20:06:55.767695569Z at Azure.Search.Documents.Indexes.SearchIndexClient.<>c__DisplayClass43_0.<<GetIndexesAsync>b__0>d.MoveNext()
2024-02-01T20:06:55.767699777Z --- End of stack trace from previous location ---
2024-02-01T20:06:55.767703927Z at Azure.Core.PageResponseEnumerator.FuncAsyncPageable`1.AsPages(String continuationToken, Nullable`1 pageSizeHint)+MoveNext()
2024-02-01T20:06:55.767712074Z at Azure.Core.PageResponseEnumerator.FuncAsyncPageable`1.AsPages(String continuationToken, Nullable`1 pageSizeHint)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
2024-02-01T20:06:55.767715941Z at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+MoveNext()
2024-02-01T20:06:55.767719575Z at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+MoveNext()
2024-02-01T20:06:55.767723526Z at Azure.AsyncPageable`1.GetAsyncEnumerator(CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
2024-02-01T20:06:55.767727593Z at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.DoesIndexExistAsync(String index, CancellationToken cancellationToken)
2024-02-01T20:06:55.767731851Z at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.DoesIndexExistAsync(String index, CancellationToken cancellationToken)
2024-02-01T20:06:55.767735961Z at Microsoft.KernelMemory.MemoryDb.AzureAISearch.AzureAISearchMemory.CreateIndexAsync(String index, MemoryDbSchema schema, CancellationToken cancellationToken)
2024-02-01T20:06:55.767739976Z at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveEmbeddingsAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767748878Z at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767771715Z at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
2024-02-01T20:06:55.767776169Z at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)
2024-02-01T20:06:55.767780546Z at ZebraGptContainerApp.Services.Implementations.CognitiveSearchService.Store(String index, String documentUrl, String fileName, Stream documentContent, List`1 adGroupIds) in /server/Services/Implementations/CognitiveSearchService.cs:line 146
2024-02-01T20:06:55.767784879Z at ZebraGptContainerApp.Services.RagGestionService.ChunkDocumentAndStoreInCognitiveSearchIndex(DocumentToProcess documentToProcess) in /server/Services/Implementations/RagGestionService.cs:line 57
2024-02-01T20:06:55.767789267Z --- End of inner exception stack trace ---
2024-02-01T20:06:55.767796235Z at ZebraGptContainerApp.Services.RagGestionService.ChunkDocumentAndStoreInCognitiveSearchIndex(DocumentToProcess documentToProcess) in /server/Services/Implementations/RagGestionService.cs:line 101
2024-02-01T20:06:55.767800684Z at ZebraGptContainerApp.BackgroundServices.DocumentProcessorBackgroundService.PerformRagGestion(IRagGestionService ragGestionService, IDocumentProcessorQueue documentProcessorQueue, CancellationToken stoppingToken) in /server/BackgroundServices/DocumentProcessorBackgroundService.cs:line 50
Ideally there would be a way to facilitate processing a very large document without failing like this. It would be useful if:
- the import could be sensitive to the load that the Azure Open AI service can support; perhaps slowing down processing to a level that will not result in 503s
- if the import mechanism in some way supported reporting back on what percentage of a document has been processed.
This may be related to #276
Importance
I cannot use Kernel Memory
Platform, Language, Versions
C#
<PackageReference Include="Microsoft.KernelMemory.Core" Version="0.26.240116.2" />
Running in Azure Container Apps / .NET 8
If there's any more information you need then let us know!
Relevant log output
See above
@johnnyreilly thanks for the kind words, glad to hear it's helping. I think your scenario would benefit from using the retry mechanism built into the Memory Service. If understand correctly, you're using the Serverless Memory approach, which does everything in the same process and synchronously. While you can run serverless tasks in the background, the internal orchestrator doesn't have a durable queue to "retry until done".
KM Service helps with that, without making too many changes. First you'd configure the service, similarly to how you configure the serverless memory, using the typical .NET JSON files approach. Then start the service, which provides a web api and internally uses queues to retry. You can secure the web API setting up custom API keys (see the config). The orchestration queues can run on Azure Queues or RabbitMQ (there's also simple queues but they are not meant for production).
Then, change your _memory
object, to be an instance of MemoryWebClient
, which has the same API of the serverless memory. The main difference, is that operations are asynchronous, that means after sending a document, you would check if the ingestion is complete calling _memory.MemoryWebClient
. If you run the service locally and send a big PDF, you'll see hitting the rate limit but retrying until done (which could take minutes, hours or more, depending on your quota).
Does the KM Service able to set the throttle rate when creating embeddings? For example, define how many concurrent api calls per minute.
@blinchi @doggy8088 If you configure the service in async mode, with the orchestration based on queues, it should automatically retry and eventually get the job done.
Hello @dluc , I think the issue is that it tries to create the index every time it attempts to add a block to the search index. I'll try to generate the service from your repository to see if that prevents the exception from occurring. Thank you!
Sorry about the delayed reply @dluc
If understand correctly, you're using the Serverless Memory approach, which does everything in the same process and synchronously.
Yes exactly
KM Service helps with that, without making too many changes.
I feel like this is sort of correct but also slightly not. I'll be honest, I really appreciate the simplicity of the serverless model. It's very simple to get it running as a background task in a .NET application using hosted services. We can have a simple app running which does processing in the background with very little complexity involved. We host as a docker container in Azure Container Apps and it just works ™️
If I understand the hosted model, I'll need to spin up a separate container alongside, set up Azure Queues or similar, configure them and work out a communication mechanism between the two containers. Totally doable I'm sure, but a significant change in the app architecture.
It may well be that my hopes are unreasonable around having a way to get serverless to rate limit its requests. Perhaps I should be biting the bullet and doing the migration - I'll have to see how much we need to process big documents before I can justify it to the team I think.
I totally appreciate Kernel Memory - but controversially I think I greatly prefer the serverless model to the hosted. My apologies!
Added a PR for this here: #387
cc @luismanez @dluc
PR #387 merged - this should address the issue. Thanks @spenavajr !
Glad I could help @dluc !