microsoft / kernel-memory

RAG architecture: index and query any data using LLM and natural language, track sources, show citations, asynchronous memory patterns.

Home Page:https://microsoft.github.io/kernel-memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Too long prompt

rosieks opened this issue · comments

Context / Scenario

I'm asking question while connected to PostgreSQL
Prompt is 25692 length
CompletionOptions.MaxTokens is 300 (if that matters)

What happened?

I got the following error:

Content:
      {
  "error": {
    "message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 5818 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}

Importance

edge case

Platform, Language, Versions

.NET/C#/0.26.240116.2

Relevant log output

info: SimpleChat[0]
      Function SimpleChat invoking.
info: Microsoft.SemanticKernel.Connectors.OpenAI.AzureOpenAIChatCompletionService[0]
      Prompt tokens: 616. Completion tokens: 21. Total tokens: 637.
info: Ask[0]
      Function Ask invoking.
fail: Ask[0]
      Function failed. Error: This model's maximum context length is 4096 tokens. However, your messages resulted in 5818 tokens. Please reduce the length of the messages.
      Status: 400 (model_error)
      ErrorCode: context_length_exceeded

      Content:
      {
  "error": {
    "message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 5818 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}


      Headers:
      Access-Control-Allow-Origin: REDACTED
      X-Content-Type-Options: REDACTED
      x-ratelimit-remaining-requests: REDACTED
      apim-request-id: REDACTED
      x-ratelimit-remaining-tokens: REDACTED
      X-Request-ID: REDACTED
      ms-azureml-model-error-reason: REDACTED
      ms-azureml-model-error-statuscode: REDACTED
      x-ms-client-request-id: dd6bf4cb-0d37-444c-8925-b67e56e5d070
      x-ms-region: REDACTED
      azureml-model-session: REDACTED
      Strict-Transport-Security: REDACTED
      Date: Fri, 26 Jan 2024 10:15:45 GMT
      Content-Length: 281
      Content-Type: application/json

      Azure.RequestFailedException: This model's maximum context length is 4096 tokens. However, your messages resulted in 5818 tokens. Please reduce the length of the messages.
      Status: 400 (model_error)
      ErrorCode: context_length_exceeded

      Content:
      {
  "error": {
    "message": "This model's maximum context length is 4096 tokens. However, your messages resulted in 5818 tokens. Please reduce the length of the messages.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}


      Headers:
      Access-Control-Allow-Origin: REDACTED
      X-Content-Type-Options: REDACTED
      x-ratelimit-remaining-requests: REDACTED
      apim-request-id: REDACTED
      x-ratelimit-remaining-tokens: REDACTED
      X-Request-ID: REDACTED
      ms-azureml-model-error-reason: REDACTED
      ms-azureml-model-error-statuscode: REDACTED
      x-ms-client-request-id: dd6bf4cb-0d37-444c-8925-b67e56e5d070
      x-ms-region: REDACTED
      azureml-model-session: REDACTED
      Strict-Transport-Security: REDACTED
      Date: Fri, 26 Jan 2024 10:15:45 GMT
      Content-Length: 281
      Content-Type: application/json

         at Azure.Core.HttpPipelineExtensions.ProcessMessageAsync(HttpPipeline pipeline, HttpMessage message, RequestContext requestContext, CancellationToken cancellationToken)
         at Azure.AI.OpenAI.OpenAIClient.GetChatCompletionsStreamingAsync(ChatCompletionsOptions chatCompletionsOptions, CancellationToken cancellationToken)
         at Microsoft.KernelMemory.AI.AzureOpenAI.AzureOpenAITextGenerator.GenerateTextAsync(String prompt, TextGenerationOptions options, CancellationToken cancellationToken)+MoveNext()
         at Microsoft.KernelMemory.AI.AzureOpenAI.AzureOpenAITextGenerator.GenerateTextAsync(String prompt, TextGenerationOptions options, CancellationToken cancellationToken)+System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult()
         at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
         at Microsoft.KernelMemory.Search.SearchClient.AskAsync(String index, String question, ICollection`1 filters, Double minRelevance, CancellationToken cancellationToken)
         at Microsoft.KernelMemory.MemoryPlugin.AskAsync(String question, String index, Double minRelevance, ILoggerFactory loggerFactory, CancellationToken cancellationToken)
         at Microsoft.SemanticKernel.KernelFunctionFromMethod.<>c.<<GetReturnValueMarshalerDelegate>b__12_4>d.MoveNext()      --- End of stack trace from previous location ---
         at Microsoft.SemanticKernel.KernelFunction.InvokeAsync(Kernel kernel, KernelArguments arguments, CancellationToken cancellationToken)
info: Ask[0]
      Function completed. Duration: 103.442043s

hi @rosieks I'm not ure how you're using the code, is it via the service or the embedded serverless memory? in the model configuration, did you set MaxTokenTotal to 4096?

I'm using serverless mode. Actually right now I set MaxTokenTotal to 4096 so instead of error I get: "INFO NOT FOUND"

Looks like setting MaxTokenTotal fixed the problem. About "INFO NOT FOUND" that would be because the search for relevant text finds no results.