RaoHai / aws-lambda-response-streaming

An AWS lambda python streaming response example

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Building streaming functions with Python on AWS Lambda

This example shows streaming response from OpenAI completions with FastAPI on AWS Lambda.

Credit to aws-lambda-web-adapter

Architecture

How does it work?

This example uses FastAPI provides inference API. The inference API endpoint invokes OpenAI, and streams the response. Both Lambda Web Adapter and function URL have response streaming mode enabled. So the response from OpenAI are streamed all the way back to the client.

This function is packaged as a Docker image. Here is the content of the Dockerfile.

FROM public.ecr.aws/docker/library/python:3.12.0-slim-bullseye

COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/lambda-adapter

# Copy function code
COPY . ${LAMBDA_TASK_ROOT}
# from your project folder.
COPY requirements.txt .
RUN pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" -U --no-cache-dir

CMD ["python", "main.py"]

Notice that we only need to add the second line to install Lambda Web Adapter.

COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/

In the SAM template, we use an environment variable AWS_LWA_INVOKE_MODE: RESPONSE_STREAM to configure Lambda Web Adapter in response streaming mode. And adding a function url with InvokeMode: RESPONSE_STREAM.

  FastAPIFunction:
    Type: AWS::Serverless::Function
    Properties:
      PackageType: Image
      MemorySize: 512
      Environment:
        Variables:
          AWS_LWA_INVOKE_MODE: RESPONSE_STREAM
      FunctionUrlConfig:
        AuthType: NONE
        InvokeMode: RESPONSE_STREAM
      Policies:
      - Statement:
        - Sid: BedrockInvokePolicy
          Effect: Allow
          Action:
          - bedrock:InvokeModelWithResponseStream
          Resource: '*'

Build and deploy

Run the following commends to build and deploy this example.

sam build --use-container
sam deploy --guided

Test the example

After the deployment completes, curl the FastAPIFunctionUrl.

curl -v -N --location '${{FastAPIFunctionUrl}}/api/chat/stream' \
--header 'Content-Type: application/json' \
--header 'Transfer-Encoding: chunked' \
--data '{"messages":[{"role":"user","content":"Count to 100, with a comma between each number and no newlines. E.g., 1, 2, 3, ..."}],"prompt":""}'

Demo

About

An AWS lambda python streaming response example

License:Apache License 2.0


Languages

Language:Python 78.6%Language:Dockerfile 21.4%