This repository provides you cdk scripts and sample code on how to implement end to end pipeline for replicating transactional data from MySQL DB in one VPC to Amazon Kinesis Data Streams in other VPC with Amazon Data Migration Service(DMS).
Below diagram shows what we are implementing.
The cdk.json
file tells the CDK Toolkit how to execute your app.
This project is set up like a standard Python project. The initialization
process also creates a virtualenv within this project, stored under the .venv
directory. To create the virtualenv it assumes that there is a python3
(or python
for Windows) executable in your path with access to the venv
package. If for any reason the automatic creation of the virtualenv fails,
you can create the virtualenv manually.
To manually create a virtualenv on MacOS and Linux:
$ cd datalake-vpc
$ python3 -m venv .venv
After the init process completes and the virtualenv is created, you can use the following step to activate your virtualenv.
$ source .venv/bin/activate
If you are a Windows platform, you would activate the virtualenv like this:
% .venv\Scripts\activate.bat
Once the virtualenv is activated, you can install the required dependencies.
(.venv) $ pip install -r requirements.txt
To add additional dependencies, for example other CDK libraries, just add
them to your setup.py
file and rerun the pip install -r requirements.txt
command.
At this point you can now synthesize the CloudFormation template for this code.
(.venv) $ export CDK_DEFAULT_ACCOUNT=$(aws sts get-caller-identity --query Account --output text) (.venv) $ export CDK_DEFAULT_REGION=region-name (.venv) $ cdk deploy --require-approval \ DataLakeVPC
Let's get NAT gateway public ips from the new VPC.
(.venv) $ aws ec2 describe-nat-gateways --region region-name --filter Name=vpc-id,Values=your-vpc-id | jq -r '.NatGateways | .[] | .NatGatewayAddresses | .[] | .PublicIp' 34.xxx.xxx.xxx 52.xxx.xxx.xxx
-
ℹ️ Create an AWS Secret for your RDS Admin user like this:
(.venv) $ cd ../source-db (.venv) $ pwd ~/aws-dms-with-multiple-vpcs/source-db (.venv) $ aws secretsmanager create-secret \ --name "your_db_secret_name" \ --description "(Optional) description of the secret" \ --secret-string '{"username": "admin", "password": "password_of_at_last_8_characters"}'
For example,
(.venv) $ aws secretsmanager create-secret \ --name "dev/rds/admin" \ --description "admin user for rds" \ --secret-string '{"username": "admin", "password": "your admin password"}'
-
Create an Aurora MySQL Cluster
(.venv) $ pwd ~/aws-dms-with-multiple-vpcs/source-db (.venv) $ pip install -r requirements.txt (.venv) $ cdk deploy \ -c vpc_name='your-existing-vpc-name' \ -c db_secret_name='db-secret-name' \ -c db_cluster_name='db-cluster-name' \ -c db_access_allowed_ip_list=NAT-Public-IPs DmsSourceDbStack
In order to set up MySQL, you need to connect Aurora MySQL cluster on either your local PC or a EC2 instance.
-
Connect to the Aurora cluster writer node.
$ mysql -hdb-cluster-name.cluster-xxxxxxxxxxxx.region-name.rds.amazonaws.com -uadmin -p Enter password: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 20 Server version: 8.0.23 Source distribution Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]> show global variables like "log_bin";
-
At SQL prompt run the below command to confirm that binary logging is enabled:
MySQL [(none)]> show global variables like "log_bin"; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | log_bin | ON | +---------------+-------+ 1 row in set (0.00 sec)
-
Also run this to AWS DMS has bin log access that is required for replication
MySQL [(none)]> call mysql.rds_set_configuration('binlog retention hours', 24); Query OK, 0 rows affected (0.01 sec)
- Run the below command to create the sample database named
testdb
.MySQL [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.00 sec) MySQL [(none)]> create database testdb; Query OK, 1 row affected (0.01 sec) MySQL [(none)]> use testdb; Database changed MySQL [testdb]> show tables; Empty set (0.00 sec)
- Also run this to create the sample table named
retail_trans
MySQL [testdb]> CREATE TABLE IF NOT EXISTS testdb.retail_trans ( -> trans_id BIGINT(20) AUTO_INCREMENT, -> customer_id VARCHAR(12) NOT NULL, -> event VARCHAR(10) DEFAULT NULL, -> sku VARCHAR(10) NOT NULL, -> amount INT DEFAULT 0, -> device VARCHAR(10) DEFAULT NULL, -> trans_datetime DATETIME DEFAULT CURRENT_TIMESTAMP, -> PRIMARY KEY(trans_id), -> KEY(trans_datetime) -> ) ENGINE=InnoDB AUTO_INCREMENT=0; Query OK, 0 rows affected, 1 warning (0.04 sec) MySQL [testdb]> show tables; +------------------+ | Tables_in_testdb | +------------------+ | retail_trans | +------------------+ 1 row in set (0.00 sec) MySQL [testdb]> desc retail_trans; +----------------+-------------+------+-----+-------------------+-------------------+ | Field | Type | Null | Key | Default | Extra | +----------------+-------------+------+-----+-------------------+-------------------+ | trans_id | bigint | NO | PRI | NULL | auto_increment | | customer_id | varchar(12) | NO | | NULL | | | event | varchar(10) | YES | | NULL | | | sku | varchar(10) | NO | | NULL | | | amount | int | YES | | 0 | | | device | varchar(10) | YES | | NULL | | | trans_datetime | datetime | YES | MUL | CURRENT_TIMESTAMP | DEFAULT_GENERATED | +----------------+-------------+------+-----+-------------------+-------------------+ 7 rows in set (0.00 sec) MySQL [testdb]>
After setting up MySQL, you should come back to the terminal where you are deploying stacks.
(.venv) $ cd ../dms-to-kinesis (.venv) $ pwd ~/aws-dms-with-multiple-vpcs/dms-to-kinesis (.venv) $ pip install -r requirements.txt (.venv) $ cdk deploy \ -c vpc_name='your-existing-vpc-name' \ -c db_secret_name='db-secret-name' \ -e DMSTargetKinesisDataStreamStack \ --parameters TargetKinesisStreamName=your-kinesis-stream-name
For example, we already created the sample database (i.e. testdb
) and table (retail_trans
)
(.venv) $ cdk deploy \ -c vpc_name='your-existing-vpc-name' \ -c db_secret_name='db-secret-name' \ -e DMSAuroraMysqlToKinesisStack \ --parameters SourceDatabaseName=testdb \ --parameters SourceTableName=retail_trans
Start the DMS Replication task by replacing the ARN in below command.
(.venv) $ aws dms start-replication-task --replication-task-arn dms-task-arn --start-replication-task-type start-replication
-
Stop the DMS Replication task by replacing the ARN in below command.
(.venv) $ aws dms stop-replication-task --replication-task-arn dms-task-arn
-
Delete the CloudFormation stack by running the below command.
(.venv) $ pwd ~/aws-dms-with-multiple-vpcs/dms-to-kinesis (.venv) $ cdk destroy
cdk ls
list all stacks in the appcdk synth
emits the synthesized CloudFormation templatecdk deploy
deploy this stack to your default AWS account/regioncdk diff
compare deployed stack with current statecdk docs
open CDK documentation
Enjoy!
- aws-dms-deployment-using-aws-cdk - AWS DMS deployment using AWS CDK (Python)
- aws-dms-msk-demo - Streaming Data to Amazon MSK via AWS DMS
- How to troubleshoot binary logging errors that I received when using AWS DMS with Aurora MySQL as the source?(Last updated: 2019-10-01)
- AWS DMS - Using Amazon Kinesis Data Streams as a target for AWS Database Migration Service
- Specifying task settings for AWS Database Migration Service tasks
- AWS DMS - Setting up a network for a replication instance
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.