Add special support for rate limiting

Question

Add special support for rate limiting

ruuda opened this issue 4 years ago · comments

Ruud van Asseldonk commented 4 years ago

In places where we use @retry, we often also use it to retry on rate limit errors, usually by detecting the rate limiting and raising a RetryException. This mostly works, but it is suboptimal for a few reasons:

The retry window is shared between rate limiting and connection errors, which means that once we got rate limited once, we are less likely to survive a subsequent connection error.
The attempt count is shared between rate limiting and connection errors, which means that if we spend one attempt on getting rate limited, we are less likely to survive a subsequent connection error.
The time scales for retrying connection errors and rate limiting are very different. For connection errors we usually want a retry window in the order of 10⁰ to 10¹ minutes, but for rate limiting, we just need to finish the work, with no obvious upper bound. We do need some protection to fail eventually if a platform continues to rate limit us indefinitely, but some platforms have very aggressive rate limits of a few calls per hour, which puts the retry window in the 10² minutes regime.
While exponential backoff with jitter is great for errors where it is unknown when service will be restored, with rate limits we often have more information. Sometimes a rate limit response directly includes a time to wait, or the time at which the rate limit will reset. Sometimes we at least know the policy about how many requests per second we can make, so that provides a good estimate for how long to wait. And if none of this is available, if we record successful and unsuccessful counts and a start time, we can make an empirical estimate of how many calls per second we can make.

I think Opnieuw would be a good place to add such functionality. Then we could decorate our calls like:

@throttle
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

So a new decorator (I called it @throttle here, better names are welcome) would take care of rate limiting, and it would trigger on RateLimitError.

The retries for rate limits and connection errors are now decoupled, so the attempt count and retry window are no longer shared.
RateLimitError could communicate any information about when the limit will reset back to the decorator.

I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to call_external_service.

@jochemb @wesleybowman what do you think?

Wesley Bowman · Answer 1 · Mon Aug 24 2020 18:22:27 GMT+0800 (China Standard Time)

I like it! As you mentioned, we run into this often.

Jochem Bijlard · Answer 2 · Mon Aug 24 2020 18:29:41 GMT+0800 (China Standard Time)

This is great idea! There is another case where we need to do retries, but where Opnieuw is not well suited. We occasionally need to wait for a file to be available a destination URL. This could take as much as 20 minutes and results in 404 until the resource has arrived. We could use a similar mechanic for that as we have for rate limits. We could call the exception to sleep on something like BackoffAndRetryException.

In terms of implementation I think differences between these two are pretty subtle:

@throttle
@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

@retry(retry_on_exceptions=CONNECTION_ERRORS_AND_INTERNAL_SERVER_ERRORS)
@throttle
def call_external_service() -> Result:
    result = do_external_thing()
    if result.is_rate_limit():
        raise RateLimitError(sleep_seconds=result.suggested_sleep_time_seconds)

Perhaps it makes sense to expose the behavior for both retries and sleeps in a single decorator as well. That way end users don't have to think about decorator ordering.

Ruud van Asseldonk · Answer 3 · Mon Aug 24 2020 18:50:33 GMT+0800 (China Standard Time)

In terms of implementation I think differences between these two are pretty subtle:

That’s a good point! You definitely want the throttle to go on the outside, otherwise the retry window for connection errors expires quickly. We could have a combined decorator that internally uses @retry for the inner retries and then applies the rate limiting on top.

Jochem Bijlard · Answer 4 · Tue Aug 25 2020 15:15:33 GMT+0800 (China Standard Time)

To me this issue has a pretty clear solution direction, except for:

I think it would even be possible to give the decorator some local state, so it can automatically measure success rate across multiple calls to call_external_service.

@ruuda What would you want it to be able to measure? Also, how should it report these measurements? Do we want it to simply log messages about these statistics or do we define some protocol so we can add callbacks that write things to the database?

Ruud van Asseldonk · Answer 5 · Wed Aug 26 2020 22:15:45 GMT+0800 (China Standard Time)

This is very roughly what I had in mind

import time
from typing import Callable, List, NamedTuple

F = Callable[[int], int]


class Call(NamedTuple):
    begin_second: float
    success: bool


def estimate_delay_seconds(calls: List[Call]) -> float:
    """You can go very deep statistically on this, but this will do for the proof of concept."""
    delay_seconds_good: List[float] = []
    delay_seconds_bad: List[float] = [0.0]
    last_success_second = calls[0].begin_second
    for call in calls[1:]:
        if call.success:
            delay_seconds_good.append(call.begin_second - last_success_second)
            last_success_second = call.begin_second
        else:
            delay_seconds_bad.append(call.begin_second - last_success_second)

    if len(delay_seconds_good) > 1:
        return 0.5 * max(delay_seconds_bad) + 0.5 * min(delay_seconds_good)
    else:
        return 0.1 + max(delay_seconds_bad) * 2.0


def throttle(f: F) -> F:
    calls: List[Call] = []

    def apply(x: int) -> int:
        if len(calls) >= 2:
            successes = [call for call in calls if call.success]
            last_success_second = successes[-1].begin_second if len(successes) > 0 else calls[0].begin_second
            delay_seconds = estimate_delay_seconds(calls)
            next_call_second = last_success_second + delay_seconds
            sleep_seconds = max(0.0, next_call_second - time.monotonic())
            if sleep_seconds > 0.0:
                print(f'Observed success rate: {1.0 / delay_seconds:.1f} Hz.')
                print(f'Sleeping {sleep_seconds:.1f} s to avoid rate limit.')
                time.sleep(sleep_seconds)

        begin_second = time.monotonic()
        try:
            result = f(x)
            calls.append(Call(begin_second, success=True))
            return result
        except:
            calls.append(Call(begin_second, success=False))
            raise

    return apply


last_call_second: float = time.monotonic()

@throttle
def succ_limited(x: int) -> int:
    global last_call_second
    now_second = time.monotonic()
    if now_second - last_call_second > 1.0:
        last_call_second = now_second
        return x + 1
    else:
        raise Exception('Too soon, only one call per second allowed.')


for i in range(20):
    try:
        succ_limited(i)
        print(f'Call {i} succeeded')

    except Exception as exc:
        print(f'Call {i} failed: {exc}')

Output

Call 0 failed: Too soon, only one call per second allowed.
Call 1 failed: Too soon, only one call per second allowed.
Observed success rate: 10.0 Hz.
Sleeping 0.1 s to avoid rate limit.
Call 2 failed: Too soon, only one call per second allowed.
Observed success rate: 3.3 Hz.
Sleeping 0.2 s to avoid rate limit.
Call 3 failed: Too soon, only one call per second allowed.
Observed success rate: 1.4 Hz.
Sleeping 0.4 s to avoid rate limit.
Call 4 failed: Too soon, only one call per second allowed.
Observed success rate: 0.7 Hz.
Sleeping 0.8 s to avoid rate limit.
Call 5 succeeded
Observed success rate: 0.7 Hz.
Sleeping 1.5 s to avoid rate limit.
Call 6 succeeded
Observed success rate: 0.9 Hz.
Sleeping 1.1 s to avoid rate limit.
Call 7 succeeded
Observed success rate: 1.1 Hz.
Sleeping 0.9 s to avoid rate limit.
Call 8 failed: Too soon, only one call per second allowed.
Observed success rate: 1.0 Hz.
Sleeping 0.1 s to avoid rate limit.
Call 9 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 10 failed: Too soon, only one call per second allowed.
Observed success rate: 1.0 Hz.
Sleeping 0.0 s to avoid rate limit.
Call 11 failed: Too soon, only one call per second allowed.
Observed success rate: 1.0 Hz.
Sleeping 0.0 s to avoid rate limit.
Call 12 failed: Too soon, only one call per second allowed.
Observed success rate: 1.0 Hz.
Sleeping 0.0 s to avoid rate limit.
Call 13 failed: Too soon, only one call per second allowed.
Observed success rate: 1.0 Hz.
Sleeping 0.0 s to avoid rate limit.
Call 14 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 15 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 16 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 17 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 18 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 19 succeeded

So it “learns” to throttle itself to avoid rate limits, even when the exact way limiting works is a black box. I think it could be useful, but this might be too much magic, maybe it is better to have some kind of limiter class that can be used in conjunction with a retry decorator to estimate a delay, rather than building it into the decorator itself.

Jochem Bijlard · Answer 6 · Thu Aug 27 2020 20:14:46 GMT+0800 (China Standard Time)

Oh that is a very cool approach to learning undocumented rate limits! We could definitely use something like as general case for endpoints where we run into occasional rate limit errors. It is curious your example fails here:

Call 9 succeeded
Observed success rate: 1.0 Hz.
Sleeping 1.0 s to avoid rate limit.
Call 10 failed: Too soon, only one call per second allowed.

I think two cases where we see a lot of problems now are not covered very well with this approach:

cases where limits are not best expressed as calls per second or when they have multiple factors. MWS for instance allows us to make 30 requests in a second and then block us for an hour, because we've reached some quota.
we usually have multiple processes that make calls counting towards the same rate limit. This means that from the perspective one of the processes rate limits seem variable. Ideally we would be able to account for that and have a mechanism to increase the rate if we don't suffer any rate limiting.

Ruud van Asseldonk · Answer 7 · Thu Aug 27 2020 21:33:16 GMT+0800 (China Standard Time)

It is curious your example fails here

I think it’s because I print the values rounded to one decimal, but the actual time it slept is slightly below 1s.

I think two cases where we see a lot of problems now are not covered very well with this approach

Yeah, after thinking about it some more this definitely should not go in the decorator, at best it can be a module to help determine how long to wait, but in almost all of our use cases we need to have special handling for the particular way of rate limiting anyway.

Ruud van Asseldonk · Answer 8 · Fri Sep 11 2020 00:36:23 GMT+0800 (China Standard Time)

I was writing an example for our new Delta API, and 1/3 of the code is related to slowing down on rate limiting, which clutters the example a lot. I will extract that part and open a pull request here.