App-vNext / Polly

Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2.0+.

Home Page:https://www.thepollyproject.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Feature request]: Simplified and faster alternative to ExecuteOutcomeAsync

ogxd opened this issue · comments

Is your feature request related to a specific problem? Or an existing feature?

I've been using Polly for a while in a high-performance production environment, where the cost of exceptions and allocations matters a lot. Polly v7 used to be problematic because of exceptions being rethrown as of how tasks work in dotnet, and we have developed our own utilities to circumvent this issue: a Result<T> pattern and an extension to move the failure state of a task into the result without awaiting it, using a custom awaiter.

With the introduction of Polly v8, we were thrilled to discover that the same path was taken with the introduction of Outcome<T> and ExecuteOutcomeAsync.

These are working great, but with our experience in this area, we think this API could be improved for both better usability and performance.

Describe the solution you'd like

Here is a fully functional solution, which is an extract of the interesting pieces of what we've developed in the past mixed with the recent ExecuteOutcomeAsync API:

public static class PollyV8Extension
{
    /// This is the interesting piece
    public static ValueTask<Outcome<TResult>> ExecuteOutcomeAsync2<TResult, TState>(this ResiliencePipeline pipeline,
        Func<ResilienceContext, TState, ValueTask<TResult>> callback,
        ResilienceContext context,
        TState state)
    {
        return pipeline.ExecuteOutcomeAsync(
            static (ResilienceContext rc, (Func<ResilienceContext, TState, ValueTask<TResult>> internalCallback, TState internalState) state) =>
                state.internalCallback(rc, state.internalState).ToAsyncOutcome(),
            context,
            (callback, state));
    }

    private static Exception Unwrap(this Exception exception)
    {
        if (exception is AggregateException aggregateException
            && aggregateException.InnerExceptions.Count == 1
            && aggregateException.InnerException != null)
        {
            return aggregateException.InnerException;
        }

        return exception;
    }

    private static ValueTask<Outcome<T>> ToAsyncOutcome<T>(this ValueTask<T> task) => ToAsyncOutcomeAwaiter<T>.GetTaskOutcome(task);

    private readonly struct ToAsyncOutcomeAwaiter<T>
    {
        private readonly ValueTask<T> _task;
        private readonly TaskCompletionSource<Outcome<T>> _tcs;

        private ValueTask<Outcome<T>> TaskResult => new(_tcs.Task);

        public static ValueTask<Outcome<T>> GetTaskOutcome(ValueTask<T> task)
        {
            return new ToAsyncOutcomeAwaiter<T>(task).TaskResult;
        }

        private ToAsyncOutcomeAwaiter(ValueTask<T> task)
        {
            _task = task;
            _tcs = new TaskCompletionSource<Outcome<T>>();

            // Setup task completed callback
            ValueTaskAwaiter<T> awaiter = _task.GetAwaiter();
            awaiter.OnCompleted(OnTaskCompleted);
        }

        private void OnTaskCompleted()
        {
            if (_task.IsCompletedSuccessfully)
            {
                _tcs.TrySetResult(Outcome.FromResult<T>(_task.Result));
            }
            else
            {
                _tcs.TrySetResult(Outcome.FromException<T>(_task.IsCanceled ? new TaskCanceledException() : _task.AsTask().Exception!.Unwrap()));
            }
        }
    }
}

Here is a simple benchmark to demonstrate the simplicity of use and the performance:

[MemoryDiagnoser(false)]
[ExceptionDiagnoser]
[SimpleJob]
public class ExecuteOutcomeAsyncBenchmark
{
    private ResiliencePipeline? _pipeline;
    private ResilienceContext? _context;

    [GlobalSetup]
    public void Setup()
    {
        _context = ResilienceContextPool.Shared.Get(CancellationToken.None);
        _pipeline = new ResiliencePipelineBuilder()
            .AddTimeout(TimeSpan.FromSeconds(1))
            .Build();
    }

    [Benchmark]
    public ValueTask<Outcome<int>> ExecuteOutcomeAsync()
    {
        return _pipeline!.ExecuteOutcomeAsync(static async (context, state) =>
        {
            // The callback for ExecuteOutcomeAsync must return an Outcome<T> instance. Hence, some wrapping is needed.
            try
            {
                return Outcome.FromResult(await DoWorkAsync());
            }
            catch (Exception e)
            {
                return Outcome.FromException<int>(e);
            }
        }, _context!, 12);
    }

    [Benchmark]
    public ValueTask<Outcome<int>> ExecuteOutcomeAsync2()
    {
        return _pipeline!.ExecuteOutcomeAsync2(static (context, state) => DoWorkAsync(), _context!, 12);
    }

    private static async ValueTask<int> DoWorkAsync()
    {
        await Task.Yield();
        throw new InvalidOperationException();
    }
}

Here are the benchmark results:

| Method               | Mean     | Error    | StdDev   | Median   | Exceptions | Allocated |
|--------------------- |---------:|---------:|---------:|---------:|-----------:|----------:|
| ExecuteOutcomeAsync  | 44.70 us | 1.271 us | 3.522 us | 43.48 us |     2.0000 |   1.93 KB |
| ExecuteOutcomeAsync2 | 25.61 us | 0.537 us | 1.523 us | 25.48 us |     1.0000 |   1.67 KB |

As you can see thanks to the ToAsyncOutcomeAwaiter we are not rethrowing the exception not even once.

If you like this idea, I can make a proper PR so that you can easily test it and discuss implementation details.

Additional context

No response

Based on the benchmarks this certainly looks interesting - the primary goal of Polly v8 was to improve performance and memory utilisation, so if this reduces it even further than would be welcome, particularly with asynchronous code paths being the primary focus.

Could you please run the benchmark with following settings as well?

[SimpleJob(RunStrategy.Monitoring, launchCount: 10, iterationCount: 100)]
[MinColumn, Q1Column, Q3Column, MaxColumn]
public class ExecuteOutcomeAsyncBenchmark

Also could please set BaseLine flag to true on the ExecuteOutcomeAsync method?

[Benchmark(Baseline = true)]
public ValueTask<Outcome<int>> ExecuteOutcomeAsync()

BTW, the benchmark results might be a bit skewed since you don't await the pipeline executions. Could you please amend your benchmark to await them?

Here are the results with:

[MemoryDiagnoser(false)]
[ExceptionDiagnoser]
[SimpleJob(RunStrategy.Monitoring, launchCount: 10, warmupCount: 10, invocationCount: 10000)]
[MinColumn, Q1Column, Q3Column, MaxColumn]
| Method               | Mean     | Error    | StdDev    | Min      | Q1       | Q3       | Max      | Ratio | RatioSD | Exceptions | Allocated | Alloc Ratio |
|--------------------- |---------:|---------:|----------:|---------:|---------:|---------:|---------:|------:|--------:|-----------:|----------:|------------:|
| ExecuteOutcomeAsync  | 47.14 us | 2.395 us |  7.061 us | 43.44 us | 45.00 us | 47.05 us | 102.2 us |  1.00 |    0.00 |     2.0000 |   1.93 KB |        1.00 |
| ExecuteOutcomeAsync2 | 28.96 us | 4.395 us | 12.959 us | 25.05 us | 26.41 us | 28.44 us | 155.9 us |  0.62 |    0.28 |     1.0000 |   1.68 KB |        0.87 |

BTW, the benchmark results might be a bit skewed since you don't await the pipeline executions. Could you please amend your benchmark to await them?

I don't think the results are skewed. The point of this optimization (with the custom awaiter) is to transfer the task exception to the outcome without awaiting it, which would rethrow it. As you can see with the current ExecuteOutcomeAsync you have no choice but to await since the method expects a delegate that returns a ValueTask<Outcome<T>>. The proposed ExecuteOutcomeAsync method however expects a delegate that returns a ValueTask<T>, which is likely to be what the user's delegate returns in the first place and does no require an asynchronous delegate.

In short, it should perform better if the user does not await, and shouldn't make a difference if the user does await. In both cases it does simplify the usage of this API (and it's not a replacement, just an additional method)

I don't think the results are skewed. The point of this optimization (with the custom awaiter) is to transfer the task exception to the outcome without awaiting it, which would rethrow it. As you can see with the current ExecuteOutcomeAsync you have no choice but to await since the method expects a delegate that returns a ValueTask<Outcome<T>>. The proposed ExecuteOutcomeAsync method however expects a delegate that returns a ValueTask<T>, which is likely to be what the user's delegate returns in the first place and does no require an asynchronous delegate.

In short, it should perform better if the user does not await, and shouldn't make a difference if the user does await. In both cases it does simplify the usage of this API (and it's not a replacement, just an additional method)

Sorry, my bad I was not specific enough. I meant inside the benchmark

    [Benchmark(Baseline = true)]
    public async ValueTask<Outcome<int>> ExecuteOutcomeAsync()
    {
        return await _pipeline!.ExecuteOutcomeAsync(static async (context, state) =>
        {
            // The callback for ExecuteOutcomeAsync must return an Outcome<T> instance. Hence, some wrapping is needed.
            try
            {
                return Outcome.FromResult(await DoWorkAsync());
            }
            catch (Exception e)
            {
                return Outcome.FromException<int>(e);
            }
        }, _context!, 12);
    }

    [Benchmark]
    public async ValueTask<Outcome<int>> ExecuteOutcomeAsync2()
    {
        return await _pipeline!.ExecuteOutcomeAsync2(static (context, state) => DoWorkAsync(), _context!, 12);
    }

Started a branch: #2132

I added a benchmark to compare the method asking for explicit T to Outcome<T> conversion against this implicit version using the custom awaiter:

| Method                                        | WithException | Mean      | Error     | StdDev    | Ratio | RatioSD | Exceptions | Gen0   | Allocated | Alloc Ratio |
|---------------------------------------------- |-------------- |----------:|----------:|----------:|------:|--------:|-----------:|-------:|----------:|------------:|
| ExecuteOutcomeAsync                           | False         |  2.629 us | 0.0345 us | 0.0495 us |  1.00 |    0.00 |          - | 0.0687 |     442 B |        1.00 |
| ExecuteOutcomeAsync_ImplicitOutcomeConversion | False         |  2.749 us | 0.1085 us | 0.1521 us |  1.05 |    0.07 |          - | 0.0877 |     559 B |        1.26 |
|                                               |               |           |           |           |       |         |            |        |           |             |
| ExecuteOutcomeAsync                           | True          | 41.281 us | 0.6188 us | 0.9070 us |  1.00 |    0.00 |     2.0000 | 0.2441 |    1764 B |        1.00 |
| ExecuteOutcomeAsync_ImplicitOutcomeConversion | True          | 23.590 us | 0.9326 us | 1.3074 us |  0.57 |    0.03 |     1.0000 | 0.1831 |    1477 B |        0.84 |

Results are close when no exception is thrown, and almost twice as fast when an exception is thrown mostly due to the exception not being rethrown. We can also see how this implicit version helps in terms of simplicity/convenience.

Test coverage still needs to be done.

This issue is stale because it has been open for 60 days with no activity. It will be automatically closed in 14 days if no further updates are made.

This issue was closed because it has been inactive for 14 days since being marked as stale.