mikeacosta / s3-json-to-postgresql

Event-driven JSON data import to PostgreSQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

s3-json-to-postgresql

See the article on Medium related to this repository.

Background

AWS Lambda function that imports JSON data to a PostgreSQL database on RDS in response to an Amazon S3 event notification. A typical event triggering a notification in this case would be creating (upload, copy or write) a JSON file in an S3 bucket.

This function will:

  1. Create a table in PostgreSQL called json_table with a data column of data type jsonb. This is obviously the field where JSON data will be inserted.
  2. Identify and access the JSON file from S3 associated with an event notification.
  3. Insert file data into json_table.
  4. Create an SNS topic called message-from-lambda if it doesn't already exist.
  5. Publish a message to the topic whenever the function is invoked.
  • Just comment or remove SNS code if you don't want a to use that service.

Requirements

  • AWS CLI
  • PostgreSQL 11+ on RDS associated with an IAM role with s3Import feature
  • Python 3.6+
  • An IAM role for the Lambda function with sufficient permission to access S3, RDS and SNS. (Attaching the "full access" managed policies for those services can be used initially to get the function working. You can customize policies to enforce "least privledge" later on.)

Set up

  1. Clone this repo
$ git clone ...
$ cd s3-json-to-postgresql
  1. Create and activate a virtual environment
$ python3 -m venv env
$ . env/bin/activate
  1. Install dependencies
$ pip install -r requirements.txt
  1. Update the src/database.cfg file with information for your PostgreSQL instance on RDS. For example:
[RDS]
HOST=dbidentifer.abcefghijklm.us-west-2.rds.amazonaws.com
DB_NAME=postgres
DB_USER=username
DB_PWD=password
  1. Modify template.yaml with the ARN for your Lambda IAM role.

  2. In command.sh, modify the --s3-bucket parameter value of hte aws cloudformation package command with the name of your S3 bucket where the Lambda package should be uploaded.

  3. From command.sh execute the commands under "zip dependencies from virtualenv and source files" which will create the lambda.zip deployment package in the src directory.

  4. Also from command.sh execute the package and deploy cloudformation commands. This will create the Lambda function on AWS.

  5. For the S3 bucket where JSON files will be created, configue an event notification that will publish a message to the Lambda function (S3toRdsLambda-ImportToRDS...). To respond to JSON file uploads, the "Put" event should be configured.

  6. Upload a JSON file to your bucket and check the json_table for the inserted record.

About

Event-driven JSON data import to PostgreSQL


Languages

Language:Python 75.5%Language:Shell 24.5%