aws-inference-benchmark

Repository with the code for running deep learning inference benchmarks on different AWS instances and service types.

Copilot example

This example demonstrates how to deploy a deep learning model for image inference using ONNX on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.

Requirements

Python 3.6 or later
Docker
AWS CLI
AWS Copilot

Deploy

Clone repository

git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/cpu/aws-copilot-inference-service

Initialize the environment and deploy the application.

copilot env init
copilot deploy

Make single prediction

curl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://<prefix>.us-east-1.elb.amazonaws.com/predict

Benchmark using apache benchmark

ab -n 10 -c 10 -p flower.png -T image/jpeg http://<prefix>.us-east-1.elb.amazonaws.com/predict

Run locally

Build the Docker image

docker build -t image-inference .

Run the Docker container

docker run --rm -p 8080:8080 image-inference

Make a prediction using the REST API

curl -X POST -H "Content-Type: image/jpeg" --data-binary "@flower.png" http://localhost:8080/predict

Test

Install the development dependencies

pip install -r dev-requirements.txt

Run the tests

pytest -v test_inference.py

Copilot LLM example

This example demonstrates how to deploy large language model for text generation using transformers library on Amazon ECS/Fargate with AWS Copilot. This project provides an easy-to-follow example and a scalable solution for serving deep learning models in the cloud.

Deploy

Clone repository

git clone https://github.com/ryfeus/aws-inference-benchmark.git
cd copilot/transformers/aws-copilot-inference-service

Clone model from Hugging Face repo. Example - LaMini T5 223M

git lfs install
git clone https://huggingface.co/MBZUAI/LaMini-T5-223M.git
mv LaMini-T5-223M model

Initialize the environment and deploy the application.

copilot env init
copilot deploy

Make single prediction

curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://<prefix>.us-east-1.elb.amazonaws.com/predict

Run locally

Build the Docker image

docker build -t llm-inference .

Run the Docker container

docker run --rm -p 8080:8080 llm-inference

Make a prediction using the REST API

curl -X POST -H "Content-Type: application/json" -d '{"instruction":"Main tour attractions in Rome:?"}' http://localhost:8080/predict

About

Repository with the code for running deep learning inference benchmarks on different AWS instances and service types.

MIT License

Languages

Language:Python 81.9%Language:C++ 15.6%Language:C 2.1%Language:Cython 0.2%Language:Fortran 0.0%Language:Jupyter Notebook 0.0%Language:CSS 0.0%Language:JavaScript 0.0%Language:Java 0.0%Language:Dockerfile 0.0%Language:CMake 0.0%