datamesh-architecture / terraform-aws-dataproduct-aws-athena

Provide a data product on aws from an exising S3 bucket with an Athena query transformation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

terraform-dataproduct-aws-athena

This open source Terraform module provisions the necessary services to provide a data product on AWS.

Overview

Services

  • AWS S3
  • AWS Athena
  • AWS Glue
  • AWS Lambda

Usage

module my_data_product {
  source = "git@github.com:datamesh-architecture/terraform-dataproduct-aws-athena.git"

  domain   = "<data_product_domain>"
  name     = "<data_product_name>"
  
  schedule = "0 0 * * ? *" # Run at 00:00 am (UTC) every day

  input = [
    {
      source = "<existing_s3_bucket>"
    }
  ]

  transform = {
    query    = "sql/<name_of_the_transform>.sql"
  }

  output = {
    format   = "<format>"
    schema   = "schema/<name_of_the_schema>.schema.json"
  }
}

Additionally, it's necessary to configure credentials for AWS. This can be done in a separate file terraform.tfvars with the following content:

aws = {
  region = "REGION"
  access_key = "ACCESS_KEY"
  secret_key = "SECRET_KEY"
}

The specified credentials can then be referenced and forwarded in the other *.tf files.

Endpoint data

The module creates an RESTful endpoint via AWS lambda (e.g. https://3jopsshxxc.execute-api.eu-central-1.amazonaws.com/prod/). This endpoint can be used as an input for another data product or to retrieve information about this data product.

{
  "domain": "<data_product_domain>",
  "name": "<data_product_name>",
  "output": {
    "location": "arn:aws:s3:::<s3_bucket_name>/output/data/"
  }
}

Examples

See examples repository.

Authors

This terraform module is maintained by André Deuerling, Jochen Christ, and Simon Harrer.

License

MIT License.

Requirements

Name Version
aws >= 4.56

Providers

Name Version
archive n/a
aws >= 4.56
local n/a

Modules

No modules.

Resources

Name Type
aws_apigatewayv2_api.lambda_info resource
aws_apigatewayv2_integration.lambda_info resource
aws_apigatewayv2_route.lambda_info resource
aws_apigatewayv2_stage.lambda_info_prod resource
aws_cloudwatch_event_rule.aws_cloudwatch_event_rule resource
aws_cloudwatch_event_target.aws_cloudwatch_event_target resource
aws_cloudwatch_log_group.lambda_info resource
aws_cloudwatch_log_group.lambda_to_cloudwatch resource
aws_glue_catalog_database.aws_glue_catalog_database resource
aws_glue_catalog_table.aws_glue_catalog_table resource
aws_glue_schema.aws_glue_schema resource
aws_iam_role.lambda_execution_role resource
aws_iam_role_policy.lambda_athena resource
aws_iam_role_policy.lambda_glue resource
aws_iam_role_policy.lambda_logs resource
aws_iam_role_policy.lambda_s3 resource
aws_iam_role_policy.lambda_s3_input resource
aws_kms_key.aws_kms_key resource
aws_lambda_function.aws_lambda_function resource
aws_lambda_function.lambda_info resource
aws_lambda_permission.aws_lambda_permission resource
aws_lambda_permission.lambda_info resource
aws_s3_bucket.aws_s3_bucket resource
aws_s3_bucket_acl.aws_s3_bucket_acl resource
aws_s3_bucket_server_side_encryption_configuration.aws_s3_bucket_server_side_encryption_configuration resource
aws_s3_object.archive_info_to_s3_object resource
aws_s3_object.archive_to_s3_object resource
local_file.lambda_info_to_s3 resource
local_file.lambda_to_s3 resource
local_file.query_to_s3 resource
archive_file.archive_info_to_s3 data source
archive_file.archive_to_s3 data source
aws_iam_policy_document.allow_athena data source
aws_iam_policy_document.allow_glue data source
aws_iam_policy_document.allow_logging data source
aws_iam_policy_document.allow_s3 data source
aws_iam_policy_document.allow_s3_input data source
aws_iam_policy_document.lambda_assume data source

Inputs

Name Description Type Default Required
aws AWS related information and credentials
object({
region = string
access_key = string
secret_key = string
})
n/a yes
domain The domain of the data product string n/a yes
input List of S3 buckets of other data products which should be used as input
list(object({
source = string
}))
n/a yes
name The name of the data product string n/a yes
output format: Output format of this data product (e.g. PARQUET)
schema: Path to the JSON schema file which describes the output of this data product
object({
format = string
schema = string
})
n/a yes
schedule The schedule expression to pass to the EventBridge event rule. Format: Minutes | Hours | Day of month | Month | Day of week | Year string "" no
transform Path to a SQL file, which should be used to transform the input data
object({
query = string
})
n/a yes

Outputs

No outputs.

About

Provide a data product on aws from an exising S3 bucket with an Athena query transformation

License:MIT License


Languages

Language:HCL 100.0%