Docker container FastAPI backend is not streaming, but waiting to send full payload.
nkma1989 opened this issue · comments
Describe the bug
We have deployed a docker container with a FastAPI backend, that uses Azure OpenAI service to generate streaming responses.
It's used as a part of a larger setup with a frontend to enable users to interact with OpenAI LLM's.
When running to FastAPI locally it works as intended and streams the response, but when deploying to Azure Web App it does not.
To Reproduce
- Setup Azure Web App
- Setup simple FastAPI app with a streamingresponse endpoint
- Deploy simple FastAPI app in docker container to Azure Web App
- Send a request to endpoint
Expected behavior
Expects a streaming response, but waits for the entire payload.
Screenshots
Examples of local behavior vs Azure web app:
Local streaming, indicating the number of chunks/elements recieved:
Now using the same request, but sending to the Azure Web App:
Device info (if applicable):
Select Web App settings:
"kind": "app,linux,container",
"siteConfig": {
"numberOfWorkers": 1,
"linuxFxVersion": "DOCKER|<REPO>/<IMAGE>",,
"acrUseManagedIdentityCreds": false,
"alwaysOn": true,
"http20Enabled": true,
"functionAppScaleLimit": 10,
"minimumElasticInstanceCount": 2
},
Docker image:
FROM python:3.10-alpine3.19
WORKDIR /app
RUN pip install -U pip setuptools wheel
RUN pip install poetry
COPY ./ /
RUN poetry install --no-dev
ENTRYPOINT ["poetry","run","gunicorn","-w","4","-k","uvicorn.workers.UvicornWorker","--timeout","600","app:app"]
Additional context
Hopefully this is something you have the time/can prioritize as we see this as a very common use case, and we have multiple solutions that utilizes this setup :)
Wrong repo, reposted elsewhere