Backoff Period and Different Parameters Per Size

Question

Backoff Period and Different Parameters Per Size

napalm684 opened this issue 3 years ago · comments

I think it might be useful to have a backoff period per execution of a function at a given size and/or different parameters per ram size calls. The reason being is it appears IoT Core sometimes enables items to show up in list/describe commands that aren't fully in AWS yet when you do further action (delete, attach, etc.). The SDK retries automatically but I am still seeing a really high error rate for some functions we have that are immutable (ie: create a new IoT thing after deleting one that exists already for instance).

I might be able to do something in pre execution but I am not sure that is the best approach.

Alex Casalboni · Answer 1 · Thu Dec 16 2021 18:05:33 GMT+0800 (China Standard Time)

Hi @napalm684 👋 Thanks for sharing!

Are you seeing high error rates when running Lambda Power Tuning with "parallelInvocation": true? Does the same happen when you disable it?

I think the idea of a backoff period makes sense to work around this type of issues, as many other services have throttling or rate limiting policies that you can't control. And while this is still useful to learn during power-tuning rather than in production, I'm open to providing a way to work around it at power-tuning time (especially in dev accounts where you might have lower limits).

Do you think even a simple (configurable) delay between invocations would work?

Today, the tool assumed that your input function is stateless and always runs without errors (in case of errors, you should resolve them before power-tuning). I'd rather not change this assumption with a complex retry logic - that's why I'm proposing to start with a simple configurable delay.

Shawn Vause · Answer 2 · Thu Dec 16 2021 22:06:34 GMT+0800 (China Standard Time)

Hi @alexcasalboni this is with "parallelInvocation": false. For this use case, you cannot run parallel invocations for the same input data (in our case its creating IoT registration, certs etc.).

I believe a configurable delay could solve the problem to allow IoT to "catch-up" with operations and achieve consistency across the api calls.

Alex Casalboni · Answer 3 · Thu Dec 16 2021 22:49:20 GMT+0800 (China Standard Time)

What if we test this approach before proceeding? You could include a "sleep" in your handler to verify if a simple delay is enough to work around the problem. If it works, I'll be glad to add a configurable delay to the state machine input, so you don't have to modify the handler. Potentially, this could cover other use cases as well.

Shawn Vause · Answer 4 · Fri Dec 17 2021 08:43:29 GMT+0800 (China Standard Time)

Sure that makes sense.

Alex Casalboni · Answer 5 · Tue Mar 29 2022 16:31:05 GMT+0800 (China Standard Time)

Hey @napalm684 👋 do you have any updates about this? was a simple delay enough to work around the problem?

Shawn Vause · Answer 6 · Tue Mar 29 2022 20:53:35 GMT+0800 (China Standard Time)

I had to back off this item for other priorities. If you want you can probably close this and we can revisit later maybe? Sorry for the delay.

Alex Casalboni · Answer 7 · Tue Mar 29 2022 21:00:48 GMT+0800 (China Standard Time)

No worries at all 🚀

I'll keep this open a few more months, feel free to jump in if you have any updates.