Assumption
- Model is trained and stored in S3
- Docker image is stored in ECR. e.g. Public ECR : https://gallery.ecr.aws/o8u9e4v2
-
Go to ECS and click
create cluster
-
Give an
name
and leave network and other settings as is -
Hit
Create
-
Your cluster is automatically configured for AWS Fargate (serverless) with two capacity providers. Add Amazon EC2 instances, or external instances using ECS Anywhere.
-
If fails, logout login again create.
Note : Till here there is no services deployed or task running on clusters.
- Next, Create a
Task Definition
under ECS
- Go to ECS--> Task Definition -> Create new Task Definition
-
Give
name to task
e.g. CifarInferenceTask -
Give
name to repo
, and get URI of repo andpaste in Image URI
, Get repo URI from ECR as see below -
Change / add port e.g. 80 if required
-
Env variables
, if any -
Specify the task size
CPU, Memory
as required by app -
Task roles (Using S3 on Fargate will require a role to access S3 in Task Definition), network mode
-
Uncheck logging if you do not wish (it is also charged $)
-
Create Task, Done!!
-
Go to Cluster created in step 1 above and click
Deploy
-
Compute configuration -> Capacity provider -> Choose
Fargate Spot
-
Deployment Configuration -> Select
Service
- Service : Launch a group of tasks handling a long-running computing work that can be stopped and restarted. For example, a web application.
- Task : Launch a standalone task that runs and terminates. For example, a batch job.
-
Specify the Task deifinition name (e.g. CifarInferenceTask) in Family text box with tags
-
Specify Service Name, e.g. CifarService
-
Security Group : Must select security group defined earlier with port enabled (e.g. testsg with port 80)
-
Public IP is
ON
-
Load Balancer - not needed here (when you expect huge volume of requests)
- Application LB
- Network Load Balancer
- Hit Deploy, wait for few minutes
NOTE : Go to Service -> Networking -> Service Role -> Add S3 full access to get access to s3 working in this group Got to Cluster-> Tasks (running one) -> click on task id hex -> Get Public IP In browser :80 , you app should run here!!
- How to provide custom name model registry ?
- Save every inference input and output to S3, along with date and time of inference -Ref-1 -Ref-2
- The Model S3 URI and the Inference input and output S3 URI must be changeable (environment variables)
Example:
docker run -it your_image:latest -e "model=s3://my-bucket/models/resnet18.pth" -e "flagged_dir=s3://my-bucket/outputs/resnet18"
-
HINT : Using S3 on Fargate will require a role to access S3 in Task Definition (Task role)
-
HINT : Demo Web UI which runs on port 80 and be publicly accessible (Security Group must have publicly accessibile port 80 set)
-
Generally prefered to have 2 workers per GPU . Or 12 cores CPU = 12 workers.