bottlerocket-os / bottlerocket-ecs-updater

A service to automatically manage Bottlerocket updates in an Amazon ECS cluster.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Set TimeoutSeconds for SSM send command

srgothi92 opened this issue · comments

What I'd like:
Default timeout of ssm command to complete is 3600sec (search 3600 in here). But we wait only X attempts for it to complete, and if not completed we consider it failed. However, the command can keep on running for an hour.

Any alternatives you've considered:
We may want to utilize TimeoutSeconds parameter in SendCommandInput

I looked into this and unless I'm mistaken setting TimeoutSeeconds in the SendCommandInput won't make a difference. If its not defined it automatically applies the default which you pointed out is 1 hour.

From what I can tell, however, the waiter (WaitUntilCommandExecuted) is where the max attempts cut off is actually happening. I was about to switch this to WaitUntilCommandExecutedWithContext so we could set a delay for the waiter, which I think would solve this problem. However, it looks like this is already being address by @umairishaq in PR #50

As such, do we need this issue anymore?

From what I can tell, however, the waiter (WaitUntilCommandExecuted) is where the max attempts cut off is actually happening.

Agreed. I am not targeting that, for situation where SSM command is stuck #53 & #54 . SSM command will keep on running for 3600 seconds before it timesout. However, our waiter only waits for X attempts to declare wait timeout, but SSM command can still keep on running. To address, this I was thinking of setting TimeoutSeconds parameter in SendCommandInput to something like 1800 seconds > waiter timeout.

Doc snippet from here:

   For example, the default value of Timeout (seconds) in the Systems Manager console is 600 seconds. If you run a 
   command by using the AWS-RunShellScript SSM document, the default value of "timeoutSeconds": 
  "{{ executionTimeout }}" is 3600 seconds

From what I can tell, however, the waiter (WaitUntilCommandExecuted) is where the max attempts cut off is actually happening.

Agreed. I am not targeting that, for situation where SSM command is stuck #53 & #54 . SSM command will keep on running for 3600 seconds before it timesout. However, our waiter only waits for X attempts to declare wait timeout, but SSM command can still keep on running. To address, this I was thinking of setting TimeoutSeconds parameter in SendCommandInput to something like 1800 seconds > waiter timeout.

Doc snippet from here:

   For example, the default value of Timeout (seconds) in the Systems Manager console is 600 seconds. If you run a 
   command by using the AWS-RunShellScript SSM document, the default value of "timeoutSeconds": 
  "{{ executionTimeout }}" is 3600 seconds

Gotcha, that makes sense. I'll take a look at what @umairishaq is setting for the waiter timeout again and get this set up. Thanks for the additional info/clarification.

Just realized there are two types of timeout DeliveryTimeout and ExecutionTimeout. What I mentioned in comments above was setting DeliveryTimeout, however we would like to set ExecutionTimeout. I think it is not possible to set ExecutionTimeout unless we create our own document.

Details about DeliveryTimeout and ExecutionTimout can be read here

Looks like you're correct. I double checked the input options and while we can set a TimeOutSeconds for the request, it only affects time to run the command, not the time the command is running. Looks like this will need to be handled by #28

Timeout value has been set from 3600 sec to 1800sec in PR-66. 1800sec/30 min sounds like a good value for now, higher than our waiter timeout value of 25min.