App-vNext / Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.

Home Page:https://www.thepollyproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: DelayBackoffType Exponential Broken Sequence

phil000 opened this issue · comments

Describe the bug

When using the DelayBackoffType.Exponential with AddRetry the sequence of delays is not correct, and random delays of 5 seconds are inserted. Across different runs the insertion of a 5 second delay varies.

Using the following setup I'd expect the delays to be 100, 200, 400, 800, 1600, 3200, 6400ms.

.AddResilienceHandler("client-name-pipeline", builder =>
{
  builder.AddRetry(new HttpRetryStrategyOptions
  {
	  MaxRetryAttempts = 8,
	  UseJitter = false,
	  ShouldRetryAfterHeader = true,
	  Delay = TimeSpan.FromMilliseconds(100),
	  MaxDelay = TimeSpan.FromSeconds(10),
	  BackoffType = DelayBackoffType.Exponential,
	  OnRetry = (msg) =>
	  {  
		  Debug.WriteLine("");
		  Debug.WriteLine("Now: " + DateTime.Now.ToString("O"));
		  Debug.WriteLine("RetryDelay: " + msg.RetryDelay);
		  Debug.WriteLine("");
		  return ValueTask.CompletedTask;
	  }
    });
  });

Expected behavior

I'd expect the delays to be 100, 200, 400, 800, 1600, 3200, 6400ms

Actual behavior

The delays as reported from OnRetry are similiar to that, although with random inserted 5 seconds.

e.g.
RUN 1:
Now: 2024-06-19T14:32:59.7222295+12:00
RetryDelay: 00:00:00.1000000

Now: 2024-06-19T14:32:59.8396248+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:33:04.8576552+12:00
RetryDelay: 00:00:00.4000000

Now: 2024-06-19T14:33:05.2750127+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:33:10.2861593+12:00
RetryDelay: 00:00:01.6000000

Now: 2024-06-19T14:33:11.9164984+12:00
RetryDelay: 00:00:03.2000000

Now: 2024-06-19T14:33:15.1271296+12:00
RetryDelay: 00:00:06.4000000

Now: 2024-06-19T14:33:21.5484924+12:00
RetryDelay: 00:00:05

RUN 2:
Now: 2024-06-19T14:35:15.9812001+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:35:20.9908018+12:00
RetryDelay: 00:00:00.2000000

Now: 2024-06-19T14:35:21.2011608+12:00
RetryDelay: 00:00:00.4000000

Now: 2024-06-19T14:35:21.6183935+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:35:26.6307151+12:00
RetryDelay: 00:00:01.6000000

Now: 2024-06-19T14:35:28.2531308+12:00
RetryDelay: 00:00:03.2000000

Now: 2024-06-19T14:35:31.4729612+12:00
RetryDelay: 00:00:06.4000000

Steps to reproduce

No response

Exception(s) (if any)

No response

Polly version

Polly.Core 8.4.0 via Microsoft.Extensions.Http.Resilience 8.6.0

.NET Version

net8.0

Anything else?

No response

The same issue happens when UseJitter = true, just the non 5 second delays are jitter'ed a bit.

e.g.
Now: 2024-06-19T14:52:14.3488193+12:00
RetryDelay: 00:00:00.0855234

Now: 2024-06-19T14:52:14.4513413+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:52:19.4682300+12:00
RetryDelay: 00:00:00.3887854

Now: 2024-06-19T14:52:19.8717992+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:52:24.8797130+12:00
RetryDelay: 00:00:05

Now: 2024-06-19T14:52:29.8880164+12:00
RetryDelay: 00:00:03.1457717

I can't fully replicate what you're seeing just from the code you've provided, but I think what you're seeing is a mix of misconfiguration (do you have other strategies configured via AddStandardResilienceHandler()?) and expected behaviour on the part of the Microsoft.Extensions.Http.Resilience library.

If ShouldRetryAfterHeader is set to true, then the DelayGenerator is set to use the value in the Retry-After header returned in the HTTP response: source.

I'm going to assume that the dependency you're trying to reach is rate limiting your with HTTP 429 responses and responding with a Retry-Header value that equates to 5 seconds - this value is preferred if it's received. That would explain where the "random" 5 seconds is coming from.

I think the other retries (the non-5 second ones) are occuring due to other negative responses (e.g. timeouts), so the retry duration you've specified takes effect in that case.

Below is a test I wrote based on your code sample, and in this case I consistently only get 5 second delays:

using System.Diagnostics;
using System.Net.Http.Json;
using System.Text.Json;
using JustEat.HttpClientInterception;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Http;
using Microsoft.Extensions.Http.Resilience;

namespace Polly;

public class PollyTests(ITestOutputHelper outputHelper)
{
    [Fact]
    public async Task Retries_When_Rate_Limited()
    {
        var options = new HttpClientInterceptorOptions();

        var stopwatch = new Stopwatch();

        var services = new ServiceCollection()
            .AddHttpClient()
            .ConfigureHttpClientDefaults(builder =>
        {
            builder.AddResilienceHandler("client-name-pipeline", builder =>
            {
                builder.AddRetry(new HttpRetryStrategyOptions
                {
                    MaxRetryAttempts = 8,
                    UseJitter = false,
                    ShouldRetryAfterHeader = true,
                    Delay = TimeSpan.FromMilliseconds(100),
                    MaxDelay = TimeSpan.FromSeconds(10),
                    BackoffType = DelayBackoffType.Exponential,
                    OnRetry = (msg) =>
                    {
                        outputHelper.WriteLine(string.Empty);
                        outputHelper.WriteLine("Now: " + DateTime.Now.ToString("O"));
                        outputHelper.WriteLine("RetryDelay: " + msg.RetryDelay);
                        outputHelper.WriteLine("Attempt: " + msg.AttemptNumber);
                        outputHelper.WriteLine("Duration: " + msg.Duration);
                        outputHelper.WriteLine("Elapsed: " + stopwatch.Elapsed);
                        outputHelper.WriteLine(string.Empty);
                        return ValueTask.CompletedTask;
                    },
                });
            });
        });

        services.AddSingleton<IHttpMessageHandlerBuilderFilter, HttpClientInterceptionFilter>((_) => new(options));

        _ = new HttpRequestInterceptionBuilder()
            .Requests()
            .ForUrl("https://jsonplaceholder.typicode.com/todos/1")
            .Responds()
            .WithStatus(System.Net.HttpStatusCode.TooManyRequests)
            .WithResponseHeader("Retry-After", "5")
            .RegisterWith(options);

        using var serviceProvider = services.BuildServiceProvider();

        var client = serviceProvider.GetRequiredService<HttpClient>();

        stopwatch.Start();
        _ = await client.GetFromJsonAsync<JsonDocument>("https://jsonplaceholder.typicode.com/todos/1");
    }

    private sealed class HttpClientInterceptionFilter(HttpClientInterceptorOptions options) : IHttpMessageHandlerBuilderFilter
    {
        public Action<HttpMessageHandlerBuilder> Configure(Action<HttpMessageHandlerBuilder> next)
        {
            return (builder) =>
            {
                next(builder);
                builder.AdditionalHandlers.Add(options.CreateHttpMessageHandler());
            };
        }
    }
}
 Retries_When_Rate_Limited
   Source: PollyTests.cs line 18
   Duration: 40.2 sec

  Message: 
System.Net.Http.HttpRequestException : Response status code does not indicate success: 429 (Too Many Requests).

  Stack Trace: 
HttpResponseMessage.EnsureSuccessStatusCode()
HttpClientJsonExtensions.<FromJsonAsyncCore>g__Core|12_0[TValue,TJsonOptions](HttpClient client, Task`1 responseTask, Boolean usingResponseHeadersRead, CancellationTokenSource linkedCTS, Func`4 deserializeMethod, TJsonOptions jsonOptions, CancellationToken cancellationToken)
PollyTests.Retries_When_Rate_Limited() line 68
--- End of stack trace from previous location ---

  Standard Output: 

Now: 2024-06-19T12:13:49.8818773+01:00
RetryDelay: 00:00:05
Attempt: 0
Duration: 00:00:00.0132939
Elapsed: 00:00:00.0482896


Now: 2024-06-19T12:13:54.8879590+01:00
RetryDelay: 00:00:05
Attempt: 1
Duration: 00:00:00.0003184
Elapsed: 00:00:05.0534323


Now: 2024-06-19T12:13:59.8962899+01:00
RetryDelay: 00:00:05
Attempt: 2
Duration: 00:00:00.0002718
Elapsed: 00:00:10.0617640


Now: 2024-06-19T12:14:04.9035151+01:00
RetryDelay: 00:00:05
Attempt: 3
Duration: 00:00:00.0001509
Elapsed: 00:00:15.0689779


Now: 2024-06-19T12:14:09.9147988+01:00
RetryDelay: 00:00:05
Attempt: 4
Duration: 00:00:00.0001158
Elapsed: 00:00:20.0802675


Now: 2024-06-19T12:14:14.9311026+01:00
RetryDelay: 00:00:05
Attempt: 5
Duration: 00:00:00.0001348
Elapsed: 00:00:25.0965807


Now: 2024-06-19T12:14:19.9460776+01:00
RetryDelay: 00:00:05
Attempt: 6
Duration: 00:00:00.0001313
Elapsed: 00:00:30.1115508


Now: 2024-06-19T12:14:24.9585671+01:00
RetryDelay: 00:00:05
Attempt: 7
Duration: 00:00:00.0001221
Elapsed: 00:00:35.1240330


If my assessment is incorrect and you still think this is a bug, please provide a self-contained repo that we can debug the issue further with.

Thank you. I'm sorry to say I've solved this issue.

I was using devproxy to simulate API errors when testing Polly new v8 resilience handlers, and the default setup of devproxy is to sometimes use a 429 response with random retry-after header. Not sure why that is always 5 seconds, but this issue is solved.

This case can be closed.