Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits

Question

Implement Exponential Backoff Strategy for AWS API Deployment Rate Limits

Dm-Chebotarskyi opened this issue 3 months ago · comments

Dmytro Chebotarskyi commented 3 months ago

Is there an existing issue for this?

I have searched existing issues, it hasn't been reported yet

Use case description

When deploying multiple stacks in parallel with Serverless Framework, we consistently encounter Rate exceeded errors, indicating that we are hitting AWS API rate limits. This issue arises particularly when deploying more than 25 stacks simultaneously, leading to a significant number of failed deployments. These errors not only cause delays but can also lead to incomplete or failed multistep pipelines where steps are dependent on one another.

The current backoff strategy appears to be linear or insufficiently scaled, as evidenced by recurring Rate exceeded logs with relatively consistent sleep times, despite the increasing number of retries. AWS recommends an exponential backoff strategy as a best practice to handle rate limiting in their APIs, as documented here: AWS Knowledge Center: How do I handle CloudFormation's rate exceeded error?.

Proposed solution (optional)

I propose that the Serverless Framework adopt an exponential backoff strategy to mitigate this issue. This strategy would involve dynamically adjusting the delay between retry attempts in an exponential manner, with the option to include jitter to further reduce the likelihood of simultaneous retries causing further throttling. Specifically, the backoff delay should be calculated as follows:

seconds_to_sleep_i = min(b * r^i, MAX_BACKOFF)

Where:

i is the retry count, starting with 1.
b is a random number between 0 and 1.
r is the exponential factor, suggested to be 2.
MAX_BACKOFF is the maximum backoff time, recommended to be 20 seconds as per AWS SDK guidelines.

Dmytro Chebotarskyi · Answer 1 · Wed Mar 20 2024 05:18:27 GMT+0800 (China Standard Time)

I have a branch on my fork with implementation and would happily create a PR.

Nicholas Morell · Answer 2 · Wed Mar 20 2024 05:22:11 GMT+0800 (China Standard Time)

This would be a huge help for me and my team. We are constantly dealing with Rate exceeded issues resulting in needed manual intervention.

James · Answer 3 · Wed Mar 20 2024 05:32:28 GMT+0800 (China Standard Time)

Commenting for visibility as my team has to manually redeploy serverless stacks on rate-exceeded pipelines everytime.

ben-exa · Answer 4 · Fri Mar 29 2024 05:05:15 GMT+0800 (China Standard Time)

Bump - this would be really helpful for my team

Justin Lan · Answer 5 · Wed Apr 17 2024 14:25:02 GMT+0800 (China Standard Time)

This would be the solution needed in our team. We occasionally face an exceeding rate, which sometimes causes rollback issues after the update fails. Please take this PR into consideration as soon as possible to make our deployments smoothly. Thanks

Arel Rabinowitz · Answer 6 · Wed May 08 2024 21:56:20 GMT+0800 (China Standard Time)

Would love to see this go in as well. These rate limits are very bad to deal with when trying to use serverless at scale

Dmytro Chebotarskyi · Answer 7 · Thu May 09 2024 03:07:14 GMT+0800 (China Standard Time)

Looks like the serverless community is not interested in merging the fix for issue #12400.
For those who face this issue and want to apply a quick fix, here is the plugin that we ended up using (credits to @ben-exa)

const ServerlessError = require('serverless/lib/serverless-error');

class AWSExponentialBackoff {
  constructor(serverless, options) {
    this.serverless = serverless;
    this.options = options;
    this.hooks = {
      initialize: this.enhanceAwsRequest.bind(this),
    };
  }

  enhanceAwsRequest() {
    const awsProvider = this.serverless.getProvider('aws');
    const originalRequest = awsProvider.request.bind(awsProvider);

    awsProvider.request = async (service, method, params, options) => {
      let attempts = 0;
      const MAX_RETRIES = 5;
      const BASE_BACKOFF = 5000; // milliseconds
      const EXPONENTIAL_FACTOR = 2;

      const retryRequest = async () => {
        try {
          return await originalRequest(service, method, params, options);
        } catch (error) {
          const { providerError } = error;
          this.serverless.cli.log(
            `Caught error: ${JSON.stringify(error, null, 2)}`,
          );

          if (
            attempts < MAX_RETRIES &&
            providerError &&
            ((providerError.retryable &&
              providerError.statusCode !== 403 &&
              providerError.code !== 'CredentialsError' &&
              providerError.code !== 'ExpiredTokenException') ||
              providerError.statusCode === 429)
          ) {
            attempts++;
            const backOff =
              BASE_BACKOFF * Math.pow(EXPONENTIAL_FACTOR, attempts - 1);
            this.serverless.cli.log(
              `Error occurred: ${error.message}. Retrying after ${
                backOff / 1000
              } seconds...`,
            );
            await new Promise((resolve) => setTimeout(resolve, backOff));
            return retryRequest();
          }
          throw new ServerlessError(
            `Failed after ${attempts} retries: ${error.message}`,
            error.code,
          );
        }
      };

      return retryRequest();
    };
  }
}

module.exports = AWSExponentialBackoff;

You can just use it in your serverless definition as follows:

plugins:
  # your plugin list
  - ./serverless/plugins/aws-exponential-backoff.js