Azure / static-web-apps

Azure Static Web Apps. For bugs and feature requests, please create an issue in this repo. For community discussions, latest updates, kindly refer to the Discussions Tab. To know what's new in Static Web Apps, visit https://aka.ms/swa/ThisMonth

Home Page:https://aka.ms/swa

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Docker container FastAPI backend is not streaming, but waiting to send full payload.

nkma1989 opened this issue · comments

Describe the bug
We have deployed a docker container with a FastAPI backend, that uses Azure OpenAI service to generate streaming responses.
It's used as a part of a larger setup with a frontend to enable users to interact with OpenAI LLM's.
When running to FastAPI locally it works as intended and streams the response, but when deploying to Azure Web App it does not.

To Reproduce

  1. Setup Azure Web App
  2. Setup simple FastAPI app with a streamingresponse endpoint
  3. Deploy simple FastAPI app in docker container to Azure Web App
  4. Send a request to endpoint

Expected behavior
Expects a streaming response, but waits for the entire payload.

Screenshots
Examples of local behavior vs Azure web app:
Local streaming, indicating the number of chunks/elements recieved:
image

Now using the same request, but sending to the Azure Web App:
image

Device info (if applicable):
Select Web App settings:

"kind": "app,linux,container",
    "siteConfig": {
        "numberOfWorkers": 1,
         "linuxFxVersion": "DOCKER|<REPO>/<IMAGE>",,
        "acrUseManagedIdentityCreds": false,
        "alwaysOn": true,
        "http20Enabled": true,
        "functionAppScaleLimit": 10,
        "minimumElasticInstanceCount": 2
    },

Docker image:

FROM python:3.10-alpine3.19

WORKDIR /app

RUN pip install -U pip setuptools wheel
RUN pip install poetry

COPY ./ /
RUN poetry install --no-dev
ENTRYPOINT ["poetry","run","gunicorn","-w","4","-k","uvicorn.workers.UvicornWorker","--timeout","600","app:app"]

Additional context
Hopefully this is something you have the time/can prioritize as we see this as a very common use case, and we have multiple solutions that utilizes this setup :)

Wrong repo, reposted elsewhere