Response with Stream enabled stops too early

Question

Response with Stream enabled stops too early

Opitzy opened this issue a month ago · comments

Describe the bug
I have set up my Azure OpenAI Endpoint and AzureCognitiveSearch Endpoint in this sample app.
As soon as I get a large response and have activated Stream, the response suddenly stops and the response is cut off.

As soon as I deactivate the stream, the complete response appears.
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Setup the Sample App with a Azure OpenAI & AzureCognitiveSearch Endpoint
Ask a question where you expect a larger response, in my case the whole Response has 49 lines and 3730 chars. In our case its a question about a documentation
With stream enabled the response will stop too early
With stream disabled you will get the full response

Expected behavior
Even with the stream enabled, I expect the complete answer and not that the stream stops too early. Also in connection with a long answer & the AzureCognitiveSearch.

Configuration: Please provide the following

Azure OpenAI model name and version
- gpt4-o
Is chat history enabled?
- No
Are you using data? If so, what data source? (e.g. Azure AI Search, Azure CosmosDB Mongo vCore, etc)
- Azure AI Search / Azure Cognitive Search
API Version
- 2024-05-01-preview

Abigail Hartman · Answer 1 · Tue Jul 02 2024 01:30:31 GMT+0800 (China Standard Time)

Hi @Opitzy , thanks for reaching out about your issue.

I am seeing a very similar issue posted here in the OpenAI forums which looks like it might potentially apply here, since we are using the same method described in the issue. Unfortunately it has no answers at this time, but I'm wondering if you could try using the normal chat.completions.create method to see if it improves the streaming use case.

For some context, here is where we are currently making the call to chat completions. We use the raw response method to have access to headers that are returned from the service. In this case, we were interested in capturing apim-request-id, which is useful for debugging issues with the service when there are exceptions, but for the purposes of comparison you could provide a dummy string value for this.

My goal here is to determine if there is a problem in the service, the SDK or the webapp itself, so I want to rule out an SDK issue first, especially since it seems that this may be an issue encountered before.

Yannick Opitz GOB · Answer 2 · Tue Jul 02 2024 13:55:41 GMT+0800 (China Standard Time)

Hey @abhahn,
thanks for the quick response.

I just checked that out to use chat.completions.create instead of chat.completions.wih_raw_response.create but sadly that didn't changed the behavior.

So i guess I'm on your side that this is potentially a problem with the SDK while looking at the Open AI Issue.

In my opinion we can close that here for now, if i get any news that would infect this WebApp i will open a new one :)

Thanks for the fast support!