aws-samples / aws-glue-migrations-between-aws-accounts

Cross Account Glue Database Migration Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview of the IDO project ( Internal Data Organization )

This project will help you automate migration of Glue databases and tables across accounts using CloudFormation. The framework was built for a customer, it connects to existing AWS account and region, clones the existing Glue database and tables within that region. Given a list of target resources it will generate the appropriate CloudFormation templates for the database and tables, including all of the original parameters from the DDL. The next step will be deploying those CloudFormation templates into the destination account.

The Use case

In the customer's environment, they needed to migrate a lot of existing services into different AWS accounts. They have used multiple accounts as Dev/Cert/Prod environments for deployments. The customer needed to automate the whole deployment process through CloudFormation to create a more maintainable and reliable product. In this specific use case they had about 10 different Glue DBs with hundreds of tables that needed to be migrated and automated. The framework has saved them a lot of time and manual effort.

Why AWS CloudFormation?

Using CloudFormation to deploy and manage services has a number of nice benefits over more traditional methods (AWS CLI, scripting, etc.).

Infrastructure-as-Code

A template can be used repeatedly to create identical copies of the same stack (or to use as a foundation to start a new stack). Templates are simple YAML- or JSON-formatted text files that can be placed under your normal source control mechanisms, stored in private or public locations such as Amazon S3, and exchanged via email. With CloudFormation, you can see exactly which AWS resources make up a stack. You retain full control and have the ability to modify any of the AWS resources created as part of a stack.

Self-documenting

Fed up with outdated documentation on your infrastructure or environments? Still keep manual documentation of Tables, Databases, etc.?

With CloudFormation, your template becomes your documentation. Want to see exactly what you have deployed? Just look at your template. If you keep it in source control, then you can also look back at exactly which changes were made and by whom.

Intelligent updating & rollback

CloudFormation not only handles the initial deployment of your infrastructure and environments, but it can also manage the whole lifecycle, including future updates. During updates, you have fine-grained control and visibility over how changes are applied, using functionality such as change sets, rolling update policies and stack policies.

File details

The files below are included in this repository:

File Description
Generator/glueFactoryClass.py This python code is responsible for all of the heavy lifting. It will take the given properties and the generic templates. It will then connect to the original account and will clone the existing glue databases and tables. Then it will generate a CloudFormation template for these resources and output them into the templates folder within the project workspace.
Generator/glueDBgeneral.template This is the Glue DB generic template - While running the code it uses this sample template to inject the existing DB parameters and properties.
Generator/glueTableGeneral.template This is the Glue Table generic template - While running the code it uses this sample template to inject the existing Table parameters and properties.
Generator/glueProperties.json This template contains the properties that is passed to the generator class, it includes properties such as which DB to clone and within it which tables to clone.

In addition, after running the software a folder called templates will contain the final templates ready for deployment in the destination account.

After the CloudFormation templates have been deployed, the Stack Resources contain the information about the different resources that was deployed.

Stack-Resources

Prerequisites

  1. Running python environment.
  2. The following python packages: collections, boto3
  3. Origin Glue DB and table to clone.

How to run it?

  1. Before running this repository make sure that you have configured your AWS account credentials so you have access to an AWS account, use aws configure (in the cli) to set YOUR_ACCESS_KEY, YOUR_SECRET_KEY. If using temporary credentials with STS, please set up a token as well using this command: aws configure set aws_session_token <YOUR_SESSION_TOKEN>. For more information .
  2. Make sure you set-up the wanted properties of the source resources to clone (set 2 parameters in glueProperties: sourceDB and sourceTables). You can set the working directory location as a parameter given from the argument line (runtime variable), the glue database to clone, the table to clone (can be one or many, to clone all existing tables in database leave empty), and the generic templates can be set in the glueProperties.json as well.
  3. Run the code and wait until it's finished, the code prints some of the results and status updates for debugging purposes.
  4. Take the final CloudFormation templates from the Templates folder, deploy them in the CloudFormation of the destination account.
  5. Because the templates are separated per resource, deploy the database first and later the actual tables in it as they depend on the database.
  6. Use the new resources as you would normally do.

Contributing

Please create a new GitHub issue for any feature requests, bugs, or documentation improvements.

Where possible, please also submit a pull request for the change.

License

This code repository is made available under the Apache 2.0 license. See the LICENSE file.