Amazon Data Migration Service(DMS) with multiple VPCs

This repository provides you cdk scripts and sample code on how to implement end to end pipeline for replicating transactional data from MySQL DB in one VPC to Amazon Kinesis Data Streams in other VPC with Amazon Data Migration Service(DMS).

Architecture

Below diagram shows what we are implementing.

⚠️ For testing easily, Aurora MySQL is provisioned in a public subnet. However, in production environment, you should provison the Aurora MySQL in a private subnet.

The cdk.json file tells the CDK Toolkit how to execute your app.

This project is set up like a standard Python project. The initialization process also creates a virtualenv within this project, stored under the .venv directory. To create the virtualenv it assumes that there is a python3 (or python for Windows) executable in your path with access to the venv package. If for any reason the automatic creation of the virtualenv fails, you can create the virtualenv manually.

Creating a new VPC

To manually create a virtualenv on MacOS and Linux:

$ cd datalake-vpc
$ python3 -m venv .venv

After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.

$ source .venv/bin/activate

If you are a Windows platform, you would activate the virtualenv like this:

% .venv\Scripts\activate.bat

Once the virtualenv is activated, you can install the required dependencies.

(.venv) $ pip install -r requirements.txt

To add additional dependencies, for example other CDK libraries, just add them to your setup.py file and rerun the pip install -r requirements.txt command.

At this point you can now synthesize the CloudFormation template for this code.

(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
(.venv) $ export CDK_DEFAULT_REGION=region-name
(.venv) $ cdk deploy --require-approval \
               DataLakeVPC

Let's get NAT gateway public ips from the new VPC.

(.venv) $ aws ec2 describe-nat-gateways --region region-name --filter Name=vpc-id,Values=your-vpc-id | jq -r '.NatGateways | .[] | .NatGatewayAddresses | .[] | .PublicIp'
34.xxx.xxx.xxx
52.xxx.xxx.xxx

Creating Aurora MySQL cluster

ℹ️ Create an AWS Secret for your RDS Admin user like this:

(.venv) $ cd ../source-db
(.venv) $ pwd
~/aws-dms-with-multiple-vpcs/source-db
(.venv) $ aws secretsmanager create-secret \
   --name "your_db_secret_name" \
   --description "(Optional) description of the secret" \
   --secret-string '{"username": "admin", "password": "password_of_at_last_8_characters"}'

For example,

(.venv) $ aws secretsmanager create-secret \
   --name "dev/rds/admin" \
   --description "admin user for rds" \
   --secret-string '{"username": "admin", "password": "your admin password"}'

Create an Aurora MySQL Cluster

(.venv) $ pwd
~/aws-dms-with-multiple-vpcs/source-db
(.venv) $ pip install -r requirements.txt
(.venv) $ cdk deploy \
              -c vpc_name='your-existing-vpc-name' \
              -c db_secret_name='db-secret-name' \
              -c db_cluster_name='db-cluster-name' \
              -c db_access_allowed_ip_list=NAT-Public-IPs
              DmsSourceDbStack

In order to set up MySQL, you need to connect Aurora MySQL cluster on either your local PC or a EC2 instance.

Confirm that binary logging is enabled

Connect to the Aurora cluster writer node.

 $ mysql -hdb-cluster-name.cluster-xxxxxxxxxxxx.region-name.rds.amazonaws.com -uadmin -p
 Enter password: 
 Welcome to the MariaDB monitor.  Commands end with ; or \g.
 Your MySQL connection id is 20
 Server version: 8.0.23 Source distribution

 Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

 Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

 MySQL [(none)]> show global variables like "log_bin";

At SQL prompt run the below command to confirm that binary logging is enabled:

 MySQL [(none)]> show global variables like "log_bin";
 +---------------+-------+
 | Variable_name | Value |
 +---------------+-------+
 | log_bin       | ON    |
 +---------------+-------+
 1 row in set (0.00 sec)

Also run this to AWS DMS has bin log access that is required for replication

 MySQL [(none)]> call mysql.rds_set_configuration('binlog retention hours', 24);
 Query OK, 0 rows affected (0.01 sec)

Create a sample database and table

Run the below command to create the sample database named testdb.

 MySQL [(none)]> show databases;
 +--------------------+
 | Database           |
 +--------------------+
 | information_schema |
 | mysql              |
 | performance_schema |
 | sys                |
 +--------------------+
 4 rows in set (0.00 sec)

 MySQL [(none)]> create database testdb;
 Query OK, 1 row affected (0.01 sec)

 MySQL [(none)]> use testdb;
 Database changed
 MySQL [testdb]> show tables;
 Empty set (0.00 sec)

Also run this to create the sample table named retail_trans

 MySQL [testdb]> CREATE TABLE IF NOT EXISTS testdb.retail_trans (
     ->   trans_id BIGINT(20) AUTO_INCREMENT,
     ->   customer_id VARCHAR(12) NOT NULL,
     ->   event VARCHAR(10) DEFAULT NULL,
     ->   sku VARCHAR(10) NOT NULL,
     ->   amount INT DEFAULT 0,
     ->   device VARCHAR(10) DEFAULT NULL,
     ->   trans_datetime DATETIME DEFAULT CURRENT_TIMESTAMP,
     ->   PRIMARY KEY(trans_id),
     ->   KEY(trans_datetime)
     -> ) ENGINE=InnoDB AUTO_INCREMENT=0;
 Query OK, 0 rows affected, 1 warning (0.04 sec)

 MySQL [testdb]> show tables;
 +------------------+
 | Tables_in_testdb |
 +------------------+
 | retail_trans     |
 +------------------+
 1 row in set (0.00 sec)

 MySQL [testdb]> desc retail_trans;
 +----------------+-------------+------+-----+-------------------+-------------------+
 | Field          | Type        | Null | Key | Default           | Extra             |
 +----------------+-------------+------+-----+-------------------+-------------------+
 | trans_id       | bigint      | NO   | PRI | NULL              | auto_increment    |
 | customer_id    | varchar(12) | NO   |     | NULL              |                   |
 | event          | varchar(10) | YES  |     | NULL              |                   |
 | sku            | varchar(10) | NO   |     | NULL              |                   |
 | amount         | int         | YES  |     | 0                 |                   |
 | device         | varchar(10) | YES  |     | NULL              |                   |
 | trans_datetime | datetime    | YES  | MUL | CURRENT_TIMESTAMP | DEFAULT_GENERATED |
 +----------------+-------------+------+-----+-------------------+-------------------+
 7 rows in set (0.00 sec)

 MySQL [testdb]>

After setting up MySQL, you should come back to the terminal where you are deploying stacks.

Create Amazon Kinesis Data Streams for AWS DMS target endpoint

(.venv) $ cd ../dms-to-kinesis
(.venv) $ pwd
~/aws-dms-with-multiple-vpcs/dms-to-kinesis
(.venv) $ pip install -r requirements.txt
(.venv) $ cdk deploy \
             -c vpc_name='your-existing-vpc-name' \
             -c db_secret_name='db-secret-name' \
             -e DMSTargetKinesisDataStreamStack \
             --parameters TargetKinesisStreamName=your-kinesis-stream-name

Create AWS DMS Replication Task

For example, we already created the sample database (i.e. testdb) and table (retail_trans)

(.venv) $ cdk deploy \
             -c vpc_name='your-existing-vpc-name' \
             -c db_secret_name='db-secret-name' \
             -e DMSAuroraMysqlToKinesisStack \
             --parameters SourceDatabaseName=testdb \
             --parameters SourceTableName=retail_trans

Run Test

Start the DMS Replication task by replacing the ARN in below command.

(.venv) $ aws dms start-replication-task --replication-task-arn dms-task-arn --start-replication-task-type start-replication

Clean Up

Stop the DMS Replication task by replacing the ARN in below command.

(.venv) $ aws dms stop-replication-task --replication-task-arn dms-task-arn

Delete the CloudFormation stack by running the below command.

(.venv) $ pwd
~/aws-dms-with-multiple-vpcs/dms-to-kinesis
(.venv) $ cdk destroy

Useful commands

cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Enjoy!

References

aws-dms-deployment-using-aws-cdk - AWS DMS deployment using AWS CDK (Python)
aws-dms-msk-demo - Streaming Data to Amazon MSK via AWS DMS
How to troubleshoot binary logging errors that I received when using AWS DMS with Aurora MySQL as the source?(Last updated: 2019-10-01)
AWS DMS - Using Amazon Kinesis Data Streams as a target for AWS Database Migration Service
Specifying task settings for AWS Database Migration Service tasks
AWS DMS - Setting up a network for a replication instance

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

ksmin23 / aws-dms-with-multiple-vpcs