alexcasalboni / aws-lambda-power-tuning

AWS Lambda Power Tuning is an open-source tool that can help you visualize and fine-tune the memory/power configuration of Lambda functions. It runs in your own AWS account - powered by AWS Step Functions - and it supports three optimization strategies: cost, speed, and balanced.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different Statistics while executing the state machine on the same lambda with similar configurations

prashanthwagle opened this issue · comments

Hello,

First of all, Kudos on the wonderful project!

As the title says, If I execute the state machine back to back, I am getting drastically different results. While I am pretty sure that throttling isn't happening, is there any other reason of occurence of this?

Some details about the Lambda on which I am executing the power tuning State machine

  • Purpose: Enqueue the payload from an SQS queue and output the result to an EFS instance
  • Layers: Yes, one layer which hosts chromium-puppeteer (chrome-aws-lambda)
  • Belongs to a VPC: Yes, inside public subnets
  • Does it need access to the internet: No
  • File System: EFS Mount

Input (Same for both executions)

{
  "lambdaARN": "MY_LAMBDA_ARN",
  "powerValues": [
    600,
    700,
    1500
  ],
  "num": 100,
  "payload": "SQS_Payload",
  "parallelInvocation": true,
  "strategy": "cost"
}

Output

Result-1: https://lambda-power-tuning.show/#WAK8AtwF;kGmmRQtlm0XIK49F;7+VbOFOLbzimeew4
Result-2: https://lambda-power-tuning.show/#WAK8AtwF;TJ8QRZIbpEVpoHNF;UhS/N7/7fDjQOck4

A Few Other instances of the drastic results:-

Hi @prashanthwagle! Thanks for sharing :)

Because your function is relying on a shared resource (the file system), it's quite likely that all those 300 concurrent invocations are affecting each other's performance. The assumption behind the tool is that executions are independent from each other, but sometimes it happens that downstream dependencies or shared resources behave differently based on the load, generating noise in the results.

In order to reduce this effect, I'd recommend disabling the parallelInvocation option (set it to false). This way, the 100 executions will happen in series and there will be only 3 concurrent executions at a time (because of the 3 power values). This should allow shared resources to behave more consistently under stable load.

Let me know if you see any difference with sequential invocation.

Closing for now. Please reopen and/or let me know if you encounter this issue again.