timoklimmer / powerproxy-aoai

Monitors and processes traffic to and from Azure OpenAI endpoints.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Blocking request towards client prevents "real" streaming

Johann-Foerster opened this issue · comments

https://github.com/timoklimmer/powerproxy-aoai/blob/ad51fb4d378562135eca180412fad17fff5dacc4/app/powerproxy.py#L229C10-L229C10

Unfortunately, this is a blocking implementation causing the proxy to wait until completion (and then just sending everything out at once in the end)

@Johann-Foerster Can you please elaborate? PowerProxy has been successfully tested with streaming requests.

During testing with GPT4 with longer running response times (say ~10s), it loads for a while and then sends out everything within nanoseconds. quickly added "print" statements before and after the marked code as well as the part that sends out the events confirmed this behaviour on the proxy side

@Johann-Foerster Thanks for submitting. I will take a look at it.

@Johann-Foerster, I have taken a look at it and was able to spot the problem and now know what needs to be done to fix it. Seems we ran into a regression bug which our tests did not reveal. I will fix and publish a fixed version asap.

I have just committed the fix and created a new release (v0.8.1), also including the increased read timeout. Closing issue as is fixed. @Johann-Foerster fyi