ThreeMammals / Ocelot

.NET API Gateway

Home Page:https://www.nuget.org/packages/Ocelot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Body cannot be forwarded twice on Aggregator

PaulARoy opened this issue · comments

Expected Behavior / New Feature

Aggregation cannot currently forward the body of the upstream to several downstreams. This is mentioned in :

Body = from.Body, // TODO Consider stream cloning for multiple reads

Actual Behavior / Motivation for New Feature

The error is not really explicit right now, it's a hidden message. The first downstream correctly reads the body and returns the expected result, but the second request has the following error:

Error connecting to downstream service, exception: System.Net.Http.HttpRequestException: An error occurred while sending the request.
 ---> System.InvalidOperationException: Sent 0 request content bytes, but Content-Length promised xxx.
   at Ocelot.Request.Mapper.StreamHttpContent.CopyAsync(Stream input, Stream output, Int64 announcedContentLength, Boolean autoFlush, CancellationToken cancellation) 

This obviously comes from the fact that the body was already read and is not.

I feel this is a reasonable request to be able to forward a body to multiple downstreams.

Steps to Reproduce the Problem

  1. Define two routes with requests having a body (GET requests for me)
  2. Define an aggregation of these two
  3. Call the aggregation

Specifications

Aggregators are said not to be able to manage POST requests, but they mainly can't manage bodies. The aggregation just forwards the stream of the body to the following requests.

Still, I got it to work but this leads to major discussions and impactful choices.

Fix

I created a basic method (which should be rewritten properly) to clone the body

    protected static async Task<MemoryStream> CloneBodyAsync(Stream body)
    {
        var memoryStream = new MemoryStream();
        await body.CopyToAsync(memoryStream);
        body.Position = 0;
        memoryStream.Position = 0;
        return memoryStream;
    }

Which can be used in the CreateThreadContextAsync (must be set Async) that way:

    private static async Task<HttpContext> CreateThreadContextAsync(HttpContext source)
    {
        var from = source.Request;
        var bodyCopy = await CloneBodyAsync(from.Body);
        var target = new DefaultHttpContext
        {
            Request =
            {
                Body = bodyCopy,

The fix was pretty straightforward to integrate, even when switching to Async methods because we have an asynchronous context just before.

However, it is important to note that being able to set back the position of the stream body at the beginning (body.Position = 0;) is not supported by default.
I had to enable buffering to be able to do this.
Example code:

app.Use(
    async (context, next) =>
    {
        context.Request.EnableBuffering(); // necessary for aggregation body copy
        await next();
    }
);

ASP Net also doesn't allow synchronous operations for this, which is why it is heavily recommended to switch to async methods as I did.

Discussion

The way I fixed it locally is definitely a choice and should be carefully considered. This requires to enable buffering of the body (pretty obvious if we want to use it multiple times). That means Ocelot has to buffer the requests, which could be potentially impactful on performances.

I fixed it in a very trivial way, but we could actually also buffer manually by loading the body into a first memory stream before doing the copy for each downstream. That way we are not reliant on enabling the buffering for all the requests, simply on complex aggregations that requires it.

I feel this is the main reason the POST method is not available for aggregations, and I think this could be a major improvement for Ocelot if done right. Supporting the forwarding of a body to multiple streams is really useful feature.
I know I stumped on this limitation by doing a borderline thing using body with GET (although I'll die on the hill that the spec does not forbid it), but I think it opens a really meaningful discussion on a key feature.

Another possibility would be to set back the Stream position at 0 after the copy in the mapper when available (and leave the user the responsibility to enable buffering) but I think it's a less elegant solution. (https://github.com/ThreeMammals/Ocelot/blob/develop/src/Ocelot/Request/Mapper/StreamHttpContent.cs)

On a side-note, to fix it locally I had to copy a lot of Ocelot code — I think most private methods I encountered could just be protected / virtual. This would help customizing the behavior of some components.

I'm more than willing to work on it if necessary, but I feel this is a key discussion to validate with the maintainers before working on it as it has major implications.

Feel free to comment on thoughts, ideas or anything, I'm completely open to criticism and discussions.

  • Version: 23.2.2
  • Platform: .NET 8
  • Subsystem: Aggregation

@PaulARoy
Aggregation supports GET verb only! The body should be empty!
Don't experiment with verbs and bodies please because in most cases your research will be not successful.
Have you read the Gotchas ?
The quote:

Aggregation only supports the GET HTTP verb.

Would you like us to move the issue forward for discussion, or would you prefer to keep it as an issue?

@ggnaegi FYI

@raman-m I read the gotchas and I specified I did on my issue. I don't agree the body should be empty. The HTTP spec doesn't disallow body for GET requests. Don't get me wrong, I know Body with GET is unconvential, but it is not unsupported.

ASP.NET doesn't disallow body for GET requests. Even Ocelot doesn't disallow body for GET requests (#1432).

That's why I said the gotchas are a bit off: it's not that it just supports the GET verb for aggregation, it actually doesn't work with Body, which could probably be more specific in the documentation.

This is also why I opened this discussion: I got it to work with bodies, which is in my opinion the way to supporting Body not only for GET, but also for POST, PUT or any other aggregation type.

I think this discussion could lead to a useful evolution of Ocelot, which is why I opened this ticket.

As I said, I'm more than willing to work on it and I have a team of 3 ready to back me up to support this; I just need you to validate some important choices before putting in a PR.

Hi @PaulARoy @raman-m, we agree, we should fix this. But we want to avoid buffering at all costs. That's why we don't use the HttpClient for downstream requests but the MessageInvoker class, and we've introduced a new mechanism for copying the request body.

I locally did the manual buffering only when it's not avoidable (i.e. multiple downstreams requiring it).

I'm ready to PR if you want the fix.

I use a first memory stream to buffer the body, then a memory stream for each request. Tell me if it's OK for you and I can offer a PR.

It's nice to note that my current code shouldn't break any existing behavior (no buffering when one route, no buffering if there is no body).

@PaulARoy commented on Apr 11:

ASP.NET doesn't disallow body for GET requests. Even Ocelot doesn't disallow body for GET requests (#1432).

Certainly! Our team’s decision and consensus make sense. Ocelot should forward all artifacts of the HTTP request.


That's why I said the gotchas are a bit off: it's not that it just supports the GET verb for aggregation, it actually doesn't work with Body, which could probably be more specific in the documentation.

We will update documentation as soon as the new behavior, new feature(s) will be delivered. Once they are ready, and I’ll be more than happy to review them including docs.


I got it to work with bodies, which is in my opinion the way to supporting Body not only for GET, but also for POST, PUT or any other aggregation type.

We’ve received numerous inquiries from GitHub users and Ocelot fans, but unfortunately, none of them have opened pull requests with good quality to enhance the logic. We had a couple of them but after reviews we closed them because of low quality and unclear
goals.


I think this discussion could lead to a useful evolution of Ocelot, which is why I opened this ticket.
As I said, I'm more than willing to work on it and I have a team of 3 ready to back me up to support this; I just need you to validate some important choices before putting in a PR.

Fine! Your enthusiasm for contributions is commendable. I recommend opening a “bugfix” pull request (PR) specifically addressing the issue of multiple copying of the request body to downstream services. Once that’s resolved, you can focus on delivering other enhancements in separate PRs. Keep up the great work!

@PaulARoy How does your copy mechanism look like?

At the moment it's a pretty raw copy from the httpContext.Request.Body to a memory stream like this:

    protected static async Task<MemoryStream> CloneBodyAsync(Stream body, bool resetSourcePosition = false)
    {
        var memoryStream = new MemoryStream();
        await body.CopyToAsync(memoryStream);
        if (resetSourcePosition)
        {
            body.Position = 0;
        }

        memoryStream.Position = 0;
        return memoryStream;
    }

(called when httpContext.Request.ContentLength is strictly positive)

A first copy is done to create the body buffer when needed.

Then a second copy is made when creating the DefaultHttpContext (with resetSourcePosition to true) for each request.

Completely open to other suggestions, this was the first raw idea I had but you might have a better perspective on the best options to use.

Let me push a PR and we can discuss sensitive topics regarding buffering with actual code. I probably won't have time today but I think I can do this tomorrow, maybe monday at worst. I'd like to clean up a few things before.

@ggnaegi commented on Apr 11

🆗 Let’s discuss the usage of buffering mechanisms in the context of Ocelot.

Your point about avoiding buffering entirely is invalid in my opinion. While Ocelot has made significant strides in providing good streaming capabilities, thanks to your efforts, Gui, there are scenarios where buffering can be beneficial. Specifically, enabling buffering for small requests (sized less than 100KB) can lead to performance improvements. Right?

Here’s a breakdown of the considerations:

  • Small Requests and Streaming:
    • For small requests, streaming might not be necessary. In such cases, buffering can be more efficient.
    • Streaming is ideal for large payloads, where reading data in chunks and forwarding it immediately is advantageous.
  • Introducing DefaultBufferSize option:
    • My proposal is: to introduce a DefaultBufferSize feature which is intriguing.
    • Analyzing the Content-Length header and comparing it to the default buffer size can guide the decision on whether to stream or buffer.
    • The current logic streams body data without buffering, but a suggestion to read data into a buffer and forward it as copied bodies (with buffering enabled) could enhance performance.
  • Community Impact:
    • Implementing this flexible feature would benefit the Ocelot community.
    • It’s essential to open an issue and engage in brainstorming to refine the requirements and gather input from other contributors.

Sounds good?

P.S.

However, these requirements might be beyond the scope of the current Body issue in multiplexing. They could be considered for enhancement in subsequent PRs.

@PaulARoy commented on Apr 11:

I'm ready to PR if you want the fix.

We eagerly await your valuable input. Please take into consideration the suggestions I’ve provided above.
As soon as you've opened a PR this issue will be accepted regarding our dev process.
And we will continue the discussion in PR thread...

@PaulARoy commented

Let me push a PR and we can discuss sensitive topics regarding buffering with actual code. I probably won't have time today but I think I can do this tomorrow, maybe Monday at worst. I'd like to clean up a few things before.

Nice! Good luck!
I've added this issue to the current regular "monthly" release.

@PaulARoy commented on Apr 11

@PaulARoy Why not implement an extension method instead and avoid instantiating a memory stream? We have tested the performance of CopyToAsync, it's great and we are using it in the Responder class. But as @raman-m wrote, you could create a PR and then we will review the code with you and validate it. Thanks!

@ggnaegi commented on Apr 11

🆗 Let’s discuss the usage of buffering mechanisms in the context of Ocelot.

Your point about avoiding buffering entirely is invalid in my opinion. While Ocelot has made significant strides in providing good streaming capabilities, thanks to your efforts, Gui, there are scenarios where buffering can be beneficial. Specifically, enabling buffering for small requests (sized less than 100KB) can lead to performance improvements. Right?

Here’s a breakdown of the considerations:

  • Small Requests and Streaming:

    • For small requests, streaming might not be necessary. In such cases, buffering can be more efficient.
    • Streaming is ideal for large payloads, where reading data in chunks and forwarding it immediately is advantageous.
  • Introducing DefaultBufferSize option:

    • My proposal is: to introduce a DefaultBufferSize feature which is intriguing.
    • Analyzing the Content-Length header and comparing it to the default buffer size can guide the decision on whether to stream or buffer.
    • The current logic streams body data without buffering, but a suggestion to read data into a buffer and forward it as copied bodies (with buffering enabled) could enhance performance.
  • Community Impact:

    • Implementing this flexible feature would benefit the Ocelot community.
    • It’s essential to open an issue and engage in brainstorming to refine the requirements and gather input from other contributors.

Sounds good?

Really interesting thoughts and I think I can see some way to create code to easily integrate this behavior.
Working on it today

@PaulARoy commented on Apr 12

I already adapted the behavior for smaller payloads by sending them directly not in chunks. So, my point is valid.

PR is opened, feel free to discuss all points on it (I can also do a Teams or Google Meet if necessary)

I feel this is the main reason the POST method is not available for aggregations, and I think this could be a major improvement for Ocelot if done right. Supporting the forwarding of a body to multiple streams is really useful feature.

Agreed! I’ve just realized that this improvement is a good preparation for real-world streaming and forwarding with the POST verb.

On a side-note, to fix it locally I had to copy a lot of Ocelot code — I think most private methods I encountered could just be protected / virtual. This would help customizing the behavior of some components.

I understand your perspective! It’s frustrating when all methods are marked as private, preventing developers from customizing behavior. In my opinion, we should consider making major methods protected. This way, they can be called in aggregation design scenarios (without inheriting in descendant classes). Additionally, making methods virtual would allow for easier overrides and inheritance from the Ocelot class, although it does introduce some risk.

This is also why I opened this discussion: I got it to work with bodies, which is in my opinion the way to supporting Body not only for GET, but also for POST, PUT or any other aggregation type.

I'll be happy if your PR will implement this functionality... 😉

As I said, I'm more than willing to work on it and I have a team of 3 ready to back me up to support this; I just need you to validate some important choices before putting in a PR.

Certainly! I’d be delighted to guide your pull request through the final delivery stage, including merging it into the develop branch and releasing it. Additionally, I extend an invitation to your team to contribute further to our open-source project!

@PaulARoy commented on Apr 12:

I can also do a Teams or Google Meet if necessary

Thank you for sharing your communication preferences, but this doesn't work for us, and we don't have regular calls at all: it's not necessary at the moment. Slack is the main communication tool for the ThreeMammals team. If you become a regular contributor and provide three PRs earning 3 spots on the Top Release contributors charts, and if you continue to contribute, we'd love to invite you to our Slack workspace for faster communication. Good luck!

From: #2050 (comment)

TODO: Next goal

Let's add MultiplexingMiddleware options.

Given options for buffering from FileAggregateRoute class:

public struct BufferingOptions
{
    public int BufferThreshold;
    public long BufferLimit;
}

Then, let's apply the options:

    protected virtual async Task<Stream> CloneRequestBodyAsync(HttpRequest request, CancellationToken aborted, BufferingOptions options)
    {
        request.EnableBuffering(options.BufferThreshold, options.BufferLimit);
        // ...
    }