int128 / terraform-aws-nat-instance

Terraform module to provision a NAT Instance using an Auto Scaling Group and Spot Instance from $1/month

Home Page:https://registry.terraform.io/modules/int128/nat-instance/aws/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instance stuck if ENI wasn't attached properly

LiranV opened this issue · comments

Hello,
I've encountered the following issue:

  1. The NAT ec2 instance needs to be replaced due to failure or spot termination.
  2. The original instance is removed and the ASG is spawning a new one.
  3. In the meantime the ENI that was used by the instance is still not available for reattachment.
  4. The new instance starts but fails to attach the ENI and gets stuck in a loop while not forwarding traffic.

This happens because the aws ec2 attach-network-interface command in the runonce.sh script to fails, but it still moves on to starting the snat service.

In the snat.sh script (ran by the snat.service) we have the following loop:

while ! ip link show dev eth1; do
  sleep 1
done

Which will run forever as the eth1 interface will never be available.

Possible solutions:

  1. Add a check after aws ec2 attach-network-interface to see that the interface was actually attached (or check return code), if not, fail somehow.
  2. Make it so the loop won't run forever so an additional script can be added by the users of the module to detect this and handle this however they see fit.