serverless / serverless

⚡ Serverless Framework – Use AWS Lambda and other managed cloud services to build apps that auto-scale, cost nothing when idle, and boast radically low maintenance.

Home Page:https://serverless.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits

Dm-Chebotarskyi opened this issue · comments

Is there an existing issue for this?

  • I have searched existing issues, it hasn't been reported yet

Use case description

When deploying multiple stacks in parallel with Serverless Framework, we consistently encounter Rate exceeded errors, indicating that we are hitting AWS API rate limits. This issue arises particularly when deploying more than 25 stacks simultaneously, leading to a significant number of failed deployments. These errors not only cause delays but can also lead to incomplete or failed multistep pipelines where steps are dependent on one another.

The current backoff strategy appears to be linear or insufficiently scaled, as evidenced by recurring Rate exceeded logs with relatively consistent sleep times, despite the increasing number of retries. AWS recommends an exponential backoff strategy as a best practice to handle rate limiting in their APIs, as documented here: AWS Knowledge Center: How do I handle CloudFormation's rate exceeded error?.

Proposed solution (optional)

I propose that the Serverless Framework adopt an exponential backoff strategy to mitigate this issue. This strategy would involve dynamically adjusting the delay between retry attempts in an exponential manner, with the option to include jitter to further reduce the likelihood of simultaneous retries causing further throttling. Specifically, the backoff delay should be calculated as follows:

seconds_to_sleep_i = min(b * r^i, MAX_BACKOFF)

Where:

  • i is the retry count, starting with 1.
  • b is a random number between 0 and 1.
  • r is the exponential factor, suggested to be 2.
  • MAX_BACKOFF is the maximum backoff time, recommended to be 20 seconds as per AWS SDK guidelines.

I have a branch on my fork with implementation and would happily create a PR.

This would be a huge help for me and my team. We are constantly dealing with Rate exceeded issues resulting in needed manual intervention.

Commenting for visibility as my team has to manually redeploy serverless stacks on rate-exceeded pipelines everytime.

Bump - this would be really helpful for my team

This would be the solution needed in our team. We occasionally face an exceeding rate, which sometimes causes rollback issues after the update fails. Please take this PR into consideration as soon as possible to make our deployments smoothly. Thanks

Would love to see this go in as well. These rate limits are very bad to deal with when trying to use serverless at scale

Looks like the serverless community is not interested in merging the fix for issue #12400.
For those who face this issue and want to apply a quick fix, here is the plugin that we ended up using (credits to @ben-exa)

const ServerlessError = require('serverless/lib/serverless-error');

class AWSExponentialBackoff {
  constructor(serverless, options) {
    this.serverless = serverless;
    this.options = options;
    this.hooks = {
      initialize: this.enhanceAwsRequest.bind(this),
    };
  }

  enhanceAwsRequest() {
    const awsProvider = this.serverless.getProvider('aws');
    const originalRequest = awsProvider.request.bind(awsProvider);

    awsProvider.request = async (service, method, params, options) => {
      let attempts = 0;
      const MAX_RETRIES = 5;
      const BASE_BACKOFF = 5000; // milliseconds
      const EXPONENTIAL_FACTOR = 2;

      const retryRequest = async () => {
        try {
          return await originalRequest(service, method, params, options);
        } catch (error) {
          const { providerError } = error;
          this.serverless.cli.log(
            `Caught error: ${JSON.stringify(error, null, 2)}`,
          );

          if (
            attempts < MAX_RETRIES &&
            providerError &&
            ((providerError.retryable &&
              providerError.statusCode !== 403 &&
              providerError.code !== 'CredentialsError' &&
              providerError.code !== 'ExpiredTokenException') ||
              providerError.statusCode === 429)
          ) {
            attempts++;
            const backOff =
              BASE_BACKOFF * Math.pow(EXPONENTIAL_FACTOR, attempts - 1);
            this.serverless.cli.log(
              `Error occurred: ${error.message}. Retrying after ${
                backOff / 1000
              } seconds...`,
            );
            await new Promise((resolve) => setTimeout(resolve, backOff));
            return retryRequest();
          }
          throw new ServerlessError(
            `Failed after ${attempts} retries: ${error.message}`,
            error.code,
          );
        }
      };

      return retryRequest();
    };
  }
}

module.exports = AWSExponentialBackoff;

You can just use it in your serverless definition as follows:

plugins:
  # your plugin list
  - ./serverless/plugins/aws-exponential-backoff.js