yesteph / terraform-ecs-cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


This module creates an EC2 autoscaling group for an ECS cluster.

The provisioned elements are :

  • cloudwatch loggroups
  • cloudwatch alarms
  • EC2 auto scaling group and scaling policies
  • IAM policies and roles
  • security groups
  • Lambda function and SNS topic to drain EC2 instances before terminating
  • Lambda function and Step function state machine to update EC2 instances when ami-id or instance-type has been changed
  • Lambda function and Step function state machine to ensure any new launched EC2 instance is visible as a container instance by the ECS service.

If the env variable is prod, then the ECS cluster resource is marked as prevent_destroy : all destroy plan will crash.

Cloudwatch log groups

In addition of the common log groups created by the terraform-ec2-common-cwlog-module, the following log groups are created with a retention of 14 days :

  • /ec2/${var.env}/${local.component_id}/ecs/ecs-agent.log
  • /ec2/${var.env}/${local.component_id}/docker
  • /ec2/${var.env}/${local.component_id}/ecs/audit.log
  • /ec2/${var.env}/${local.component_id}/ecs/ecs-init.log

Where local.component_id is "tf-${var.env}-${var.project}-ecs-${var.cluster_name}".

Cloudwatch alarms

An Out Of Memory alarm is created, based on counting "died due to OOM" pattern in the ECS agent logs. A "max-size-alarm" is created, based on 90% of the max size of the auto scaling group.

EC2 resources

launch configuration

AMI selection for the launch configuration is branched according to the presence of the var.ami_id. If var.ami_id is specified, it is used. If not, we use the latest AMI whose name is ecs-*, architecture is x86_64, the tag:lifecycle matching the var.ami_lifecycle_tag and owner is 028907936641 (Transverse account).

It provides a user-data script which configure awslogs agent, ecs agent and docker daemon to use the specified HTTP proxy. Note if instance_type is not t2.*, ebs_optimized is set.

auto scaling group

The auto-scaling group enables advanced metrics. Propagated tags are :

  • cost:environment
  • cost:cost-center
  • cost:project

scale-in / scale-out policies

4 policies are defined :

  • CPU based scale-out
  • Memory based scale-out
  • CPU base scale-in
  • Memory based scale-in

CPU and Memory are based on "Reservation" of the cluster. That means you must define CPU and Memory reservation when you define ECS tasks/services.

Use the parameter ec2_scaling_policy to define the policies you want to declare. Possible values of ec2_scaling_policy are min_max_cpu_and_memory | min_max_cpu_only | min_max_memory_only. Using cpu_and_memory supposes you will declare ECS tasks AND EC2 instance type with a similar CPU/memory ratio.

scheduled actions

If ${var.env} is not "prod", scheduled actions are set to scale-in 0 instances during nights and weekends.

You can use auto_shutdown to disable auto shutdown for non prod cluster. In addition, times for this auto shutdown/startup can be configured with startup_cron_expression and shutdown_cron_expression.

IAM policies and roles

A role for the EC2 instances of the cluster is provisioned. It is attached to :

  • an IAM policy for standard ECS container instances (ECR / ecs API)
  • the provided commons_ec2_policy

In addition, 2 roles are created :

  • A container service role, to manage load-balancing and service launch/registration
  • An autoscale role, to update ECS services

Security groups

A security group is created for the cluster. It allows SSH from the provided bastion security group.

The security group is outputted to allow further rules to be added.

"Drainer" Lambda function and SNS topic

Using lifecycle hooks, all EC2_INSTANCE_TERMINATING events from the auto scaling group are sent to an SNS topic which triggers a lambda. This "drainer" lambda calls 'drain' on the EC2 instance planned for termination. When there is no more running ECS tasks on the instance, the lifecycle is completed. Then termination carry on.

Timeout on the lifecycle is 300 seconds.

"Updater" Lambda function and state-machine

Using CloudWatch Event rules, all API calls to UpdateLaunchConfiguration for the ECS autoscaling group are sent to a step function "updater". This step function calls an "updater" lambda which checks if the ECS autoscaling group must be updated (check: AMI, instance type). If must, then set desired_count value. The step function asserts the lambda return code to determine if the update is done. If not, it waits for 10 minutes then call again the "updater" lambda.

After 24H if some EC2 instances are not updated, the step function fails. A cloudwatch alarm is set to send notifications if the step function fails.

"Launch" Lambda function and state-machine

Using CloudWatch Event rules, all API calls to LaunchInstance for the ECS autoscaling group are sent to a step function "launch". This step function calls an "launch" lambda which checks if the launched EC2 instance is visible as a container instance id. The step function asserts the lambda return code to determine if the ECS agent on the VM is connected. If not, it waits for 30 seconds then call again the "launch" lambda.

After 300 seconds if the EC2 instance is still nor connected, the step function fails. A cloudwatch alarm is set to send notifications if the step function fails.


Name Description Type Default Required
__ec2_scaling_policy_format Internal variable to validate ec2_scaling_policy map <map> no
alarm_notification_topic_arn The ARN where notifications must be sent. string - yes
ami_id Default is "false", then last ecs-* is used. If different from "false", use the provided ami_id. string false no
ami_lifecycle_tag The value of the lifecycle tag to select the most recent ami. string validated no
auto_shutdown Boolean to indicate if the cluster must be shutdown. Bypass to false if env is prod! string true no
autoscaling_max_size The max number of EC2 instances in the ECS cluster. string - yes
autoscaling_min_size The min number of EC2 instances in the ECS cluster. string - yes
aws_region The AWS region used. string - yes
bastion_sg The security group id of bastion SSH. string - yes
cluster_name The name of the ECS cluster. string - yes
commons_ec2_policy_arn The ARN of the IAM policy used for commons EC2 actions made by the FTP. string - yes
ec2_scaling_policy Determine the policy used to auto scale the EC2 instances in the cluster. Possible values are: min_max_cpu_and_memory min_max_cpu_only min_max_memory_only. string
ecs_heartbeat_timeout The timeout in seconds to let an ECS instance in 'draining' state. If some ECS tasks are still running after this timeout, they will stopped string 600 no
enable_alarm_creation A boolean to indicate if the alarms must be created. string true no
env The environment of this infrastructure: dev or prod. string - yes
http_proxy The endpoint of the HTTP proxy to go on Internet. string - yes
instance_type string m5.large no
project The project related to this cluster. string - yes
scaling_cpu_max_percent The maximum CPU reservation threshold to add an instance. string 80 no
scaling_cpu_min_percent The minimum CPU reservation threshold to remove an instance. string 50 no
scaling_memory_max_percent The maximum MEMORY reservation threshold to add an instance. string 75 no
scaling_memory_min_percent The minimum MEMORY reservation threshold to remove an instance. string 50 no
shutdown_cron_expression The UTC cron expression to shutdown (min=max=desired = 0) the cluster. string 0 18 * * * no
ssh_key_pair The name of the SSH key pushed to the container instances of the ECS cluster. string - yes
startup_cron_expression The UTC cron expression to start (min=autoscaling_min_size, max=autoscaling_max_size) the cluster. string 0 7 * * 1-5 no
subnets The list of subnet ids used to create the ECS cluster. list - yes
vpc_id The VPC where the cluster must be created. string - yes


Name Description
ami_id The name of the used AMI
autoscaling_group_name The name of the created autoscaling group



Language:HCL 72.2%Language:Python 26.4%Language:Smarty 0.9%Language:Shell 0.6%