PacktPublishing / Data-Engineering-with-AWS

Data Engineering with AWS, Published by Packt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hands-on – ingesting data with AWS DMS - CH 6

davide-marcon opened this issue · comments

Good morning,

I have a problem with replicating the data inside the MySQL RDS instance in the S3 Bucket.
I followed the procedure in the book but the result in the landing-bucket is like this:

  • dataeng-clean-zone-dm
    • sakila-db/
      • mysql/
      • performance_schema/
      • sys/

Thus in Athena I don't see any 'sakila' database but I have:
Database: AwsDataCatalog
Tables:

  • cleanzonedb
  • mysql
  • performance_schema
  • sys

Here how I set up the source endpoint
image

I did the procedure a second time but results are the same. Any idea of what could be the cause? Thanks

I resolved the issue. Here what I have done.

EC2 instance wasn't able to install mariadb because it needs root privileges. So I changed the bash script's first command as follows:

sudo yum install -y mariadb

In this way sakila database is correctly loaded in our RDS instance.

Than I wanted to replicate only "sakila" database in the S3 bucket, without "mysql", "performance_schema", and "sys". To do so I just specified to do so in the Database Migration Task settings.

image

Now everything works fine and I'm able to query it with Athena.

EDIT: you need also to create a custom "parameter group" associated to the RDS database and set log_bin_trust_function_creators = 1

Apologies for the delay in responding to the issue that you reported.

I did some testing today, and was not able to replicate your issue. I deployed the CloudFormation template as it is in Github, and MariaDB was installed fine and the MySQL commands correctly loaded the data to the MySQL instance. I did not need to edit the parameter group either. The instructions on limiting the DMS migration task to only the Sakila database is covered in the book (at Step 34 on Page 195).

According to the AWS documentation, the script specified in the userdata section is always run as root, so there is no need to include sudo in the yum install command (see the documentation at https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html#user-data-shell-scripts).

I did my testing in the Ohio region (us-east-2) so I am wondering whether perhaps this doesn't work in the same way in a different region, although I don't see why it would not. Please let me know which region you are working in so that I can try and recreate the issue by testing deployment in that region.

I use eu-north-1.

From the documentation you linked it seems indeed that user data scripts are done as root. The issue could be that when I investigated the problem I directly logged in to the EC2 instance and wrote step by step each row of the script. In that case it could be that the default user is the non-root.

I'll try again in the next days. Maybe the misconfiguration was somewhere else. Thanks

Also, please would you confirm whether you are using the first or second edition of the book? The first edition was released in 2021 (and looks like this - https://www.amazon.com/Data-Engineering-AWS-Gareth-Eagar/dp/1800560419) while the second edition was released in 2023 (https://www.amazon.com/Data-Engineering-AWS-AWS-based-transformation-dp-1804614424/dp/1804614424/ref=dp_ob_title_bk?).

I tried again and indeed everything works fine just following book instructions. Thus I made a mistake somewhere else.
I'm using 2021 book version, and in this case in DMS setting you set "%" for both Schema and Table names. But I can image that it is something you already solved in the 2023 version.
Thanks again for your help

Glad to hear that you were able to get it working!