pdlug / s3-upload-processing-cdk

Direct uploads to S3 using presigned URLs with a Lambda to process uploaded content

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Direct file uploads to S3 + processing CDK

Demonstrates how to upload directly to an S3 bucket and use a Lambda to process the uploaded content.

The user uploads files directly to S3 using presigned URLs generated by a Lambda fronted by API Gateway. The React component then uploads the file to using the axios HTTP client. Axios is used because it provides upload progress events so a progress bar can be displayed.

Getting started

Deploy CDK

cd cdk
npm i
npm run cdk deploy

The stack will deploy using CDK and output the URL of the newly created API. Create app/.env.local and add a line setting the environment variable VITE_API_BASE_URL to the URL in the CDK output without the trailing slash. For example:

VITE_API_BASE_URL="https://XXXXX.execute-api.us-east-1.amazonaws.com"

Start the dev server for the app locally:

cd app
npm i
npm run dev

Navigate to http://localhost:3000 and upload a file. You should see the progress bar update and then ID of the upload displayed when it completes.

Architecture

S3 bucket holds all the uploaded content. An ID (UUID) is generated for each upload. This avoids any issues with odd file naming, special characters, etc. The key structure is:

  • uploads/{id}/content - object directly uploaded by the user, a lifecycle rule on the prefix uploads/ expires objects after 1 day to ensure that any uploads that fail to process are removed.
  • files/{id}/content - object after processing by the upload processor Lambda. The original content as uploaded is in the content key with the namespace then allowing for other metadata or variants to be stored under the prefix with the ID (ex: thumbnail images, OCR'd text, etc.)

The S3 bucket has transfer acceleration enabled so uploads use the AWS edge network reducing latency. This incurs additional cost but greatly improves the upload speed for large files.

Presigned URLs to enable the end user to upload directly to S3 without the need for any other components. To generate presigned URLs an API Gateway with a Lambda function is used. A POST to /uploads on the API Gateway base URL with the JSON payload below returns the ID of the upload and the presigned URL to upload the file to. The presigned URLs expire after 15 minutes (configure in CDK if a different duration is needed)

Once the browser has uploaded the file the upload processor lambda is invoked automatically using the S3 event source subscribing to create object events with the uploads/ prefix. The processor moves the uploaded content from the uploads/ namespace to the files/ namespace and can perform any other processing needed. Common use cases here would be: image conversion, thumbnail generation, OCR text extraction, format validation, parsing, etc.

API endpoints

POST /uploads

Create an upload returning the ID and URL to upload the file to. Currently the filename and size properties are ignored while contentType is needed to generate the presigned URL.

Request payload:

{
  "contentType": "image/png",
  "filename": "test.png",
  "size": 1024
}

Response:

{
  "id": "b9308960-510d-4aa8-b561-99ff227e720c",
  "url": "https://XXX.s3.us-east-1.amazonaws.com/uploads/b9308960-510d-4aa8-b561-99ff227e720c/content?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ASIAUWCGBYJABYAQEHP7%2F20220413%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20220413T014010Z&X-Amz-Expires=900&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEKr%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQDTf4AMcOxZjiXQwo%2BlNydQ3rShUoLWsiAEWASKzmrAJwIgMlPmafMD4K8cnaNV8Vq3l8AGR8rbK7S%2F%2BITTqvH4Wq0quAIIUxABGgwzMjIyNjk0NjMxMDQiDKblqmao7YMOJV52BCqVAn5P5AZ0SCFwkb%2BVVS7DrqVxrEWgzLAW%2BCqPmLadazvmgguBwD1qd8nl2QbJCbzrd5yHcd8ICX6ajuTsRjLFRjauN3BP%2FBj01J%2Bc45riDbePP87De0MlaNGKjH2tvsifc2IwXOm8caRvYDANHPmfUZ%2B82n8ZObDDyfi2EYGyY1FPP19oFE7Is42DVoMZFYLwC%2Fxy2isy3Na3SSoypAovyZ8JU6xF0TKu%2B2wvv%2Bmw%2FFa1SGnwe61fcQPvzqyy%2FmGkjl2MAwYabUBX07Y%2Ft9YSL%2BmKkF0Xb8mppfSVBih0rTC%2FxoL3aZ6oZRgq838ixM%2FG1mEl7UQvR9WSPYyy5rwuX5VPaWpHH%2B%2FdSvSTwLOT57BPpZzzL1Uw%2BdPYkgY6mgHketH0St6PfwNQNFpb5swwnyMjm5De77v8JDdADihIfv4KZpWdkAlVv6a67OfFkqKauLBHoDsaEiVUl3QJFbp4S5yHom%2Ba6g3KDycTY0I5f3LDXSqXfnMygilXjo2djU91MmkhjWV9jCGfq1zwbPxH6ij9O7SIwsEFfe53N0UJc8gdT48AoBG5kfzp97q1U9hjgB81H4557Yhb&X-Amz-Signature=ca1cddc3712fec24ea996c4544207d2f5b8e552e268d2c3fbca84b8fcb972158&X-Amz-SignedHeaders=host&x-id=PutObject"
}

Production use

This is a demonstration project there are a few considerations for turning this into something more robust.

API Gateway authentication

The API gateway is deployed without authentication for demonstration purposes. For real life applications this may or may not be appropriate. For stronger authentication the endpoint could be authorized against Cognito or another identity management solution. For other user cases a simple API key may be sufficient.

Filenames and other metadata

If it's important to retain the original filename (+ possibly other metadata that can be captured) use a database (DynamoDB is a great choice). Store the upload ID along with the filename in the createUpload Lambda. The database can also be updated by the upload processor and any other transformations to indicate status.

Error handling

Any failures in upload processing are handled silently. A more robust error handling approach is needed, logging at a minimum, events to EventBridge or SQS for retries or further reporting, etc.

About

Direct uploads to S3 using presigned URLs with a Lambda to process uploaded content


Languages

Language:TypeScript 89.0%Language:JavaScript 8.4%Language:HTML 2.3%Language:CSS 0.4%