morazow / exasol-emr-terraform

The Terraform modules to create Exasol and EMR clusters on AWS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exasol EMR Cluster Setup on AWS using Terraform

This is an infrastructure as a code Terraform project that creates Exasol and EMR clusters on Amazon AWS.

Motivation

Exasol has a nice CloudFormation templates where you can build Exasol clusters. Please have look at SOL-605. However, if you want something to work from command line and do not want to click around in AWS, then this project might help you.

Here I also added EMR cluster which I need usually to test the integration of Exasol with Hadoop tools. This allows me to automatically create them on demand and terminate when I finish working.

Still there are manual steps required for setting up Exasol Buckets.

Prerequisites

Several tools and accounts should be available before using the project.

AWS Account

Create an AWS account if you do not have one already. You can sign-up here. The account should have admin access and secret keys in order to use the AWS Command Line tools.

AWS CLI

Install aws command line interface. You can follow instructions provided at aws-cli in order to install it.

AWS CLI Profile

Create a credentials profile for aws-cli with access and secret keys of your account.

$ aws configure --profile my-user-profile

AWS Access Key ID [None]: <Your AWS Account Access Key>
AWS Secret Access Key [None]: <Your AWS Account Secret Key>
Default region name [None]:
Default output format [None]:

We keep region and output formats empty.

You can manually edit credentials file, ~/.aws/credentials, anytime if you want to update it later.

Install Terraform

In order to install Terraform, you can follow the instructions from here.

Usage

Please follow these steps for quick start usage.

Update Configuration File

Copy the configuration file config.tfvars.example to config.tfvars and modify the parameters inside it. Make sure you provide the correct aws profile name and other variables.

An example configurations:

profile                   = "exasol"
project                   = "SPRKCT"
environment               = "staging"
exa_image_id              = "EXASOL-6.0.6-4-BYOL"
exa_license_file_path     = "./mor_byol_license.xml"
exa_db_password           = "my-awesome-password"
exa_db_node_count         = "3"
exa_db_node_type          = "m4.2xlarge"
exa_db_replication_factor = "1"
exa_db_standby_node       = "0"
emr_release_label         = "emr-5.19.0"
emr_master_type           = "m4.xlarge"
emr_master_count          = "1"
emr_core_type             = "m4.2xlarge"
emr_core_count            = "3"

User Public SSH Keys

Additionally you can add public ssh keys so that you can ssh to EMR master node without providing private pem file.

Edit file bootstrap_user_keys.sh as follows:

#!/bin/bash

cat <<EOT >> ~/.ssh/authorized_keys
ssh-rsa SSH_PUBLIC_KEY <username>
#
# ADD MORE HERE
#
EOT

Once you have clusters running this makes it easy to ssh into emr master node:

ssh hadoop@$(terraform output out-emr-master-dns)

Similarly with socks proxy enabled:

ssh -D 8157 hadoop@$(terraform output out-emr-master-dns)

Run

To start setting up clusters run:

terraform init

terraform get -update

terraform plan -var-file config.tfvars -out terraform.tfplan

terraform apply -auto-approve -var-file config.tfvars

This will take some time until everything is setup. So you can go and grab a coffee.

When you want to destroy the clusters please run:

terraform plan -destroy -var-file config.tfvars -out terraform.tfplan

terraform apply terraform.tfplan

Makefile

You can also use Makefile commands to create the clusters.

Command Description
make runs terraform init, plan and apply
make init terraform init, run this if it is the first run
make update terraform update
make plan terraform plan
make apply terraform apply, create both clusters
make destroy terraform destroy, destroy everything
make exasol create only Exasol cluster
make emr create only EMR cluster
make clean remove plan or generated files
make run-hive creates hive tables in EMR Hive using HDFS
make run-etl-import runs etl loader scripts to populate Exasol tables

Configuration Variables

The following Terraform configuration variables should be provided.

Configuration Default Description
profile An aws-cli profile name defined in ~/.aws/credentials
project An identifier string for project name used in tagging resources
environment An identifier string for environment used in tagging resources
exa_image_id An AWS AMI image id to for creating an Exasol cluster
exa_license_file_path A path to license file if BYOL (Bring Your Own License) image id is used
exa_license_file_path A path to license file if BYOL (Bring Your Own License) image id is used
exa_db_password A password to use for authentication of admin and sys users
exa_db_node_count 3 The number nodes for Exasol cluster
exa_db_replication_factor 1 A replication factor for Exasol cluster
exa_db_standby_node 0 The number of standby nodes for Exasol cluster
emr_release_label emr-5.19.0 A release version for EMR cluster
emr_master_type m4.xlarge An EC2 instance type for EMR cluster master node
emr_master_count 1 The number of master nodes for EMR cluster
emr_core_type m4.2xlarge An EC2 instance type for EMR cluster core nodes
emr_core_count 3 The number of core nodes for EMR cluster

The project configuration variable is also used to create a exa:project tag.

Manual Steps

This is not fully automated yet, there are still some manual steps you need to follow. Some of them are:

  • Open Exasol BucketFS http & https ports
  • Create an Exasol bucket
  • Upload jars to Exasol buckets
  • Run Hive tables creations using make run-hive. This creates hive tables in HDFS that will be loaded to Exasol later.
  • Run ETL loader scripts to populate Exasol tables make run-etl-import; however, for this to work ETL jars should be uploaded to bucket /buckets/bfsdefault/bucket1/.

License

The MIT License (MIT)

About

The Terraform modules to create Exasol and EMR clusters on AWS

License:MIT License


Languages

Language:HCL 65.0%Language:Shell 16.4%Language:Python 12.2%Language:SQLPL 3.8%Language:Makefile 2.6%