Instance stuck if ENI wasn't attached properly
LiranV opened this issue · comments
Liran Vaknin commented
Hello,
I've encountered the following issue:
- The NAT ec2 instance needs to be replaced due to failure or spot termination.
- The original instance is removed and the ASG is spawning a new one.
- In the meantime the ENI that was used by the instance is still not available for reattachment.
- The new instance starts but fails to attach the ENI and gets stuck in a loop while not forwarding traffic.
This happens because the aws ec2 attach-network-interface
command in the runonce.sh
script to fails, but it still moves on to starting the snat
service.
In the snat.sh
script (ran by the snat.service
) we have the following loop:
while ! ip link show dev eth1; do
sleep 1
done
Which will run forever as the eth1 interface will never be available.
Possible solutions:
- Add a check after
aws ec2 attach-network-interface
to see that the interface was actually attached (or check return code), if not, fail somehow. - Make it so the loop won't run forever so an additional script can be added by the users of the module to detect this and handle this however they see fit.