alexcasalboni / aws-lambda-power-tuning

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Functions - and it supports three optimization strategies: cost, speed, and balanced.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Backoff Period and Different Parameters Per Size

napalm684 opened this issue · comments

I think it might be useful to have a backoff period per execution of a function at a given size and/or different parameters per ram size calls. The reason being is it appears IoT Core sometimes enables items to show up in list/describe commands that aren't fully in AWS yet when you do further action (delete, attach, etc.). The SDK retries automatically but I am still seeing a really high error rate for some functions we have that are immutable (ie: create a new IoT thing after deleting one that exists already for instance).

I might be able to do something in pre execution but I am not sure that is the best approach.

Hi @napalm684 👋 Thanks for sharing!

Are you seeing high error rates when running Lambda Power Tuning with "parallelInvocation": true? Does the same happen when you disable it?

I think the idea of a backoff period makes sense to work around this type of issues, as many other services have throttling or rate limiting policies that you can't control. And while this is still useful to learn during power-tuning rather than in production, I'm open to providing a way to work around it at power-tuning time (especially in dev accounts where you might have lower limits).

Do you think even a simple (configurable) delay between invocations would work?

Today, the tool assumed that your input function is stateless and always runs without errors (in case of errors, you should resolve them before power-tuning). I'd rather not change this assumption with a complex retry logic - that's why I'm proposing to start with a simple configurable delay.

Hi @alexcasalboni this is with "parallelInvocation": false. For this use case, you cannot run parallel invocations for the same input data (in our case its creating IoT registration, certs etc.).

I believe a configurable delay could solve the problem to allow IoT to "catch-up" with operations and achieve consistency across the api calls.

What if we test this approach before proceeding? You could include a "sleep" in your handler to verify if a simple delay is enough to work around the problem. If it works, I'll be glad to add a configurable delay to the state machine input, so you don't have to modify the handler. Potentially, this could cover other use cases as well.

Sure that makes sense.

Hey @napalm684 👋 do you have any updates about this? was a simple delay enough to work around the problem?

I had to back off this item for other priorities. If you want you can probably close this and we can revisit later maybe? Sorry for the delay.

No worries at all 🚀

I'll keep this open a few more months, feel free to jump in if you have any updates.