Scatter gather with AWS lambda

The challenge

Implement batch processing on AWS:

scatter: Split up a single file of records to be processed (the file has been uploaded via s3)
process: Process records as parallel as possible
gather: Detect completion of processing and aggregate a result summary report in s3

make clean start_localstack deploy benchmark report

make stop_localstack clean

All resources will be prefixed with your current ${USER}-. Pass SCOPE=mycustomprefix- to make to override this default.

make ENV=aws clean deploy_resources deploy_service benchmark report

make ENV=aws destroy

The task has been implemented in various variants:

s3-sqs-lambda-sync (with boto3 blocking io)
s3-sqs-lambda-async (with aioboto3 async io)
s3-sqs-lambda-async-chunked (with aioboto3 async io, records packed into chunks)
s3-sqs-lambda-dynamodb (with aioboto3 async io, records stored in dynamodb)
s3-notification-sqs-lambda (with aioboto3 async io, records stored in s3 in chunks, functions invoked by s3 notifications through sqs queues)

Scatter gather with AWS lambda

Apache License 2.0

Language:Python 82.6%Language:HCL 12.9%Language:Makefile 4.5%