google / guava

Google core libraries for Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

There is no charset parameter on application/json

reschke opened this issue · comments

Description

Ref: https://datatracker.ietf.org/doc/html/rfc8259#section-11 (last paragraph)

But see:

public static final MediaType JSON_UTF_8 = createConstantUtf8(APPLICATION_TYPE, "json");

This type should be deprecated (and users encouraged to use a variant without charset).

Example

MediaType.JSON_UTF_8 is in non-sense media type; no charset param exists for JSON.

Expected Behavior

Use should lead to a deprecation warning.

Actual Behavior

It does not.

Packages

com.google.common.net

Platforms

No response

Checklist

Interesting, thanks. I don't know that this has come up before.

We actually had application/json without a charset before replacing it with the current constant back in 2012 (before MediaType was added to Guava). That presumably was the right move then (since it predates the 2017 RFC you've shared).

It is interesting that the RFC also says "Adding [a charset] really has no effect on compliant recipients," which suggests that including one should be harmless for compliant recipients.

But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a charset parameter should in fact technically be harmless.

There's additionally the question of whether the charset parameter makes things better or worse for non-compliant recipients. (And then there's the question of whether helping non-compliant recipients is a good thing or a bad thing... :))

Our internal security guidance says that it is "critical" to include the charset parameter. That said, the guidance dates from at least 7 years ago, and I don't know how recently it's been reevaluated. Some chain of other links led me to https://portswigger.net/research/json-hijacking-for-the-modern-web, which was from 2016 (with some kind of update in 2022), which likewise suggests that the charset is important (or at least was back then). However, I haven't read it nearly closely enough to have much confidence in anything.

Someone seems to be reporting that Dart needed the parameter back in 2019. Ditto some "HttpClient" in 2020.

And I've seen another report or two that some receivers reject anything that includes charset (example)....

I fear that we could end up the latest project to have "ping-ponging this back and forth, and there's always some broken client."

We could consider talking more with our security people to see what they recommend. We'd want to have a pretty solid understanding before nudging users toward a change that might break something that had previously been working (whether it was really supposed to be working or not).

In general, are extra, unrecognized parameters considered an error in media types?

But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a charset parameter should in fact technically be harmless.

That's marked as "Reported", which just means that someone thought it would be a good idea to make that change. I don't think we can conclude anything from it.

(I've been fooled by RFC Errata before.)

Exactly - unless it's verified it doesn't mean anything.

In general, are extra, unrecognized parameters considered an error in media types?

Usually no.

The problem is more educational: sending "charset=UTF-8" sort of implies that "charset=UTF-16" would change the encoding detection. And that would be a bug.

As would be to require the presence of the param.