CreateCommandsInvoker update hangs when stack is in state UPDATE_ROLLBACK_COMPLETE

Question

CreateCommandsInvoker update hangs when stack is in state UPDATE_ROLLBACK_COMPLETE

FilipeAleixo opened this issue 3 years ago · comments

I've noticed that sam deploy hangs in this specific situation:

I do an unsuccessful deploy which has to be rolled back due to an error, which then puts the stack in state UPDATE_ROLLBACK_COMPLETE.

When I subsequently correct the error and call sam deploy again it hangs in this stage:

This is what I see in the stack:

Even deleting the stack becomes impossible after this because there's no way to stop the custom resource (at least I didn't find one), so I just have to wait for it to time out and delete it.

Any clue on how to fix this?

Filipe Aleixo · Answer 1 · Sun Sep 05 2021 04:20:45 GMT+0800 (China Standard Time)

Just pinging you to show what I've built using your repo: https://github.com/EsteFilipe/discord-ethereum-authentication

Thanks for the clean code :)

Yannik Tausch · Answer 2 · Sun Sep 05 2021 06:54:33 GMT+0800 (China Standard Time)

As far as I understand, this error occurs when a bug is introduced into CreateCommandsFunction that prevents CfnLambda to run as intended and post the custom resource status back to CloudFormation via a pre-signed URL that is provided to the Lambda function. (For more details of how CloudFormation custom resources work under the hood, refer to here).

As long as the CloudFormation resource update is running, it's not possible to update or delete the stack.

Unfortunately, the timeout period for custom resource status updates is quite long (2 hours?) and as far as I can tell, there is currently no way to change this.

The only options I see are:

Wait for the custom resource to timeout and create another stack for development in the meantime
Try to send the status update via HTTP yourself (link) - note: I didn't try this myself

I agree that this is very annoying during development but I do not see any way how to make things easier here. This would be AWS`s turn. If you find out something new with regeard to this issue or if it works out well to send the expected status update yourself - feel free to share!

By the way: Your project looks awesome! Thanks for sharing!

Filipe Aleixo · Answer 3 · Sun Sep 05 2021 17:50:37 GMT+0800 (China Standard Time)

Thanks for the thorough explanation!

The only way I've found to overcome this for development was to just keep creating stacks with new names, while leaving the failed ones so that they time out and I delete them later. This seems like quite a shortcoming from Cloudformation - they even have videos like this one suggesting a hack of send a signal to the lambda in order to be able to delete it https://www.youtube.com/watch?v=hlJkMoCxR-I

raisen · Answer 4 · Fri Jun 30 2023 07:03:40 GMT+0800 (China Standard Time)

Just FYI, I was having problems with CreateCommandsInvoker hanging and was able to fix by increasing the Cloudformation timeout which defaults to 3 seconds.

Also, I've upgraded to Node 18 and didn't find any issues.

Thank you for this repo!