whatwg / fetch

Fetch Standard

Home Page:https://fetch.spec.whatwg.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

non-null body with GET/HEAD requests

jasnell opened this issue · comments

@annevk @domenic ... I know that this has been discussed previously but I'd like to re-open the conversation about what happens when someone creates a Request object with a non-null body when method is equal to GET or HEAD. I am specifically referring to bullet point 35 at https://fetch.spec.whatwg.org/#dom-request

image

While the HTTP spec defines GET and HEAD such that request body payloads are meaningless, it does not forbid them, and there are production API in the wild (looking at you Elastic) that will allow specifying a body payload with a GET. Such APIs are currently not usable with fetch because of the restriction in the spec that requires that an error be thrown with non-null bodies.

This causes direct problems for us in the Workers runtime because we have cases where customers want to do something like...

const newRequest = new Request(request.url, { method: request.method, body: request.body });

Without having to introduce special case logic for when request.method happens to be GET or HEAD. In the implementation of Request on the server-side, we will allow receiving a GET with a content-length: 0 header that will result in the request.body being a zero-length stream rather than null.

The change in the spec that I would like to see is relaxing the requirement on bullet point 35 of https://fetch.spec.whatwg.org/#dom-request to say that a non-null body is always accepted by the Request constructor but that host implementations are free to ignore the body if the method used is known not to apply any semantics to the body payload. This would allow browser implementations to continue just ignoring the body payload for GET and HEAD but would allow other implementations to choose how to handle those.

As far as I can tell, such a relaxation would not be breaking for the web in any way, and would afford runtimes greater flexibility to choose how such methods are handled.

Specifically, the change would be that the following would no longer throw an error and implementations would be free to decide whether or not to consume the body based on it's understanding of the semantics of whatever method is used.

new Request('https://example.com', { body: new ReadableStream() });
new Request('https://example.com', { body: new Uint8Array(n));  // where n is any value

we will allow receiving a GET with a content-length: 0

To clarify, it's more than that. Cloudflare Workers today supports receiving and proxying GET requests with bodies of any size, not just zero size. We had to, because these requests exist in the wild, and it's a basic requirement of our product that it can proxy any kind of request actually seen in the wild.

We still don't allow such requests to be constructed from scratch in JavaScript, but because they can arrive off the network, it creates some awkward situations where people have a Request object that's actually impossible to reproduce from its component parts.

All browsers also do not follow this bullet point in the spec. Chrome tried to ship a fix for this (aligning to spec) and found it was not web compatible.

@lucacasonato ... hmm, In chrome and edge the following throws an error, new Request("http://example.com', { method: 'GET', body: new Uint8Array(0) })

Oh sorry this is about request, I was thinking this was for response - all the browsers incorrectly return a body for HEAD fetch calls even though the spec says not to

Isn't this kind of spec-noncompliance what https://github.com/wintercg/fetch is for?

I don't think there's a reason to get rid of this error on the web, since such bodies are explicitly disallowed there.

Isn't this kind of spec-noncompliance what wintercg/fetch is for?

Yes and no. The goal is not really to maintain a forked spec that collects a bunch of non-compliant changes. Ideally if changes are needed an effort would be made to influence the direction of the core living spec first, thus this issue. If this effort to introduce this modification to the core fetch spec fails, then yes, my next step would be to seek consensus among wintercg participating runtimes to collectively agree that such a change is worthwhile.

I don't think there's a reason to get rid of this error on the web, since such bodies are explicitly disallowed there.

By "on the web" here I think you are specifically referring to Web browsers in particular, correct? The core specification of HTTP (which I would argue is generally more fundamentally "the web" but that's a debate for another time) specifically allows bodies for GET and HEAD even if those same specifications say they don't have any semantic meaning and should be avoided. The reality is that there are real world, in-production applications in use today on the web that do, in fact, attach a payload body to a GET request and that the fetch API as it is defined today is incapable of supporting those.

Also note that removing this error, the way I have frame it, would not change much of anything for existing Web browser implementations, which would still be allowed to entirely ignore the given non-null body when used with GET and HEAD, so if browsers wish to continue to disallow such bodies, they would simply continue to do so, albeit silently rather than with a throw.

Yes, by "the web" I mean it in the sense it's used in "WHATWG", i.e. web browsers.

At a minimum any "reopen" request should surface and study the older closed issue. I vaguely recall that even HTTP WG representatives didn't necessarily see the point of this.

Indeed. RFC9110 is very clear about this:

Although request message framing is independent of the method used, content received in a GET request has no generally defined semantics, cannot alter the meaning or target of the request, and might lead some implementations to reject the request and close the connection because of its potential as a request smuggling attack (Section 11.2 of [HTTP/1.1]). A client SHOULD NOT generate content in a GET request unless it is made directly to an origin server that has previously indicated, in or out of band, that such a request has a purpose and will be adequately supported. An origin server SHOULD NOT rely on private agreements to receive content, since participants in HTTP communication are often unaware of intermediaries along the request chain.

Previous HTTP specifications said that bodies were allowed on all methods, but that was only to allow new methods to be introduced without coordinating with every intermediary on the Web. It was not intended to retrofit bodies onto GET, and RFC7231 clarified that; when the message didn't get through, we (i.e., a consensus of HTTP implementers) made it more explicit in 9110.

Yes, we're well aware that some applications (e.g., Elasticsearch) do use bodies on GET. That is not interoperable on the open web; it only works in closed systems that, effectively, are not HTTP. While some might think that their deployment qualifies as such, it's rarely the case, because most people do not control every possible HTTP system and component that the messages might ever be handled by (eg load balancers, virus scanners, WAFs). Doing so also reduces the choice of tools and implementations that can be used in that deployment, reducing the value of using HTTP.

The HTTP WG is working on a better way to do this. That's not done yet, but in the meantime, POST is viable. I'm happy to help talk folks through the issues here if it will help.

Wow, I managed to completely miss that change in 9110!! Thank you @mnot and @annevk for that. Unfortunately, given that there are deployed applications out there that are still using payloads in GETs, runtimes are still going to need to deal with this case but given this I agree that changing the spec on this isn't worthwhile. I appreciate the quick turnaround on the discussion and will close this :-)

Yes, we're well aware that some applications (e.g., Elasticsearch) do use bodies on GET. That is not interoperable on the open web; it only works in closed systems that, effectively, are not HTTP. While some might think that their deployment qualifies as such, it's rarely the case, because most people do not control every possible HTTP system and component that the messages might ever be handled by (eg load balancers, virus scanners, WAFs). Doing so also reduces the choice of tools and implementations that can be used in that deployment, reducing the value of using HTTP.

FWIW, Cloudflare supports bodied GETs throughout its stack. In Workers specifically, it was very early on that we were forced to support this due to user complaints. So my expectation is that any HTTP proxy product is wide use is probably forced to support bodied GETs and therefore such support is a de facto standard.