There is no charset parameter on application/json
reschke opened this issue · comments
Description
Ref: https://datatracker.ietf.org/doc/html/rfc8259#section-11 (last paragraph)
But see:
This type should be deprecated (and users encouraged to use a variant without charset).
Example
MediaType.JSON_UTF_8 is in non-sense media type; no charset param exists for JSON.
Expected Behavior
Use should lead to a deprecation warning.
Actual Behavior
It does not.
Packages
com.google.common.net
Platforms
No response
Checklist
- I agree to follow the code of conduct.
Interesting, thanks. I don't know that this has come up before.
We actually had application/json
without a charset before replacing it with the current constant back in 2012 (before MediaType
was added to Guava). That presumably was the right move then (since it predates the 2017 RFC you've shared).
It is interesting that the RFC also says "Adding [a charset] really has no effect on compliant recipients," which suggests that including one should be harmless for compliant recipients.
But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a charset
parameter should in fact technically be harmless.
There's additionally the question of whether the charset
parameter makes things better or worse for non-compliant recipients. (And then there's the question of whether helping non-compliant recipients is a good thing or a bad thing... :))
Our internal security guidance says that it is "critical" to include the charset
parameter. That said, the guidance dates from at least 7 years ago, and I don't know how recently it's been reevaluated. Some chain of other links led me to https://portswigger.net/research/json-hijacking-for-the-modern-web, which was from 2016 (with some kind of update in 2022), which likewise suggests that the charset
is important (or at least was back then). However, I haven't read it nearly closely enough to have much confidence in anything.
Someone seems to be reporting that Dart needed the parameter back in 2019. Ditto some "HttpClient" in 2020.
And I've seen another report or two that some receivers reject anything that includes charset
(example)....
I fear that we could end up the latest project to have "ping-ponging this back and forth, and there's always some broken client."
We could consider talking more with our security people to see what they recommend. We'd want to have a pretty solid understanding before nudging users toward a change that might break something that had previously been working (whether it was really supposed to be working or not).
In general, are extra, unrecognized parameters considered an error in media types?
But wait, https://www.rfc-editor.org/errata/eid5853 says that that sentence should be replaced :\ I'd have to read more to understand whether including a
charset
parameter should in fact technically be harmless.
That's marked as "Reported", which just means that someone thought it would be a good idea to make that change. I don't think we can conclude anything from it.
(I've been fooled by RFC Errata before.)
Exactly - unless it's verified it doesn't mean anything.
In general, are extra, unrecognized parameters considered an error in media types?
Usually no.
The problem is more educational: sending "charset=UTF-8" sort of implies that "charset=UTF-16" would change the encoding detection. And that would be a bug.
As would be to require the presence of the param.