libwww-perl / HTTP-Message

The HTTP-Message distribution contains classes useful for representing the messages passed in HTTP style communication.

Home Page:https://metacpan.org/pod/HTTP::Message

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

->decoded_content should decode application/json, etc [rt.cpan.org #82963]

oalders opened this issue · comments

Migrated from rt.cpan.org#82963 (status was 'open')

Requestors:

From mauke@cpan.org on 2013-01-25 22:13:06:

Currently $response->decoded_content will decode the bytes of e.g.
"Content-type: text/json; charset=UTF-8" messages because it knows
"text/*" is ... text.

It would be nice if this could be extended to also decode the text for
content-types such as "application/json; charset=UTF-8",
"application/javascript; charset=ISO-8859-15", etc.

From skaufman@cpan.org on 2013-07-21 18:05:19:

On Fri Jan 25 17:13:06 2013, MAUKE wrote:
> Currently $response->decoded_content will decode the bytes of e.g.
> "Content-type: text/json; charset=UTF-8" messages because it knows
> "text/*" is ... text.
> 
> It would be nice if this could be extended to also decode the text for
> content-types such as "application/json; charset=UTF-8",
> "application/javascript; charset=ISO-8859-15", etc.

Bump, just ran into the same issue after a few hours.
in HTTP::Headers->content_is_text,
shouldn't the presence of charset in the content-type imply that the content is characters, ie text?

From swong@cpan.org on 2013-08-13 09:27:23:

Second on this.

When I say decode, I know what I am doing - currently there is no way to force it.

  $response->decoded_content(charset => 'utf-8')

Adding (charset_strict => 1, raise_error => 1) doesn't help.

Better yet, the content type I get is
  Content-Type: application/json; charset=UTF-8
Maybe content_is_text() should returns true if the charset is present in the content-type header?

From bbyrd@cpan.org on 2014-08-27 03:12:50:

Third.  Currently, the code says:

if ($self->content_is_text || (my $is_xml = $self->content_is_xml)) {

Examples where LWP currently breaks include:

application/json
application/yaml
application/x-yaml
application/pdf
application/* (that isn't +xml)

The Content-Type really shouldn't matter.  If the Content-Type is "pork/beans; charset=UTF-8", it should still be decoded.

If the remote agent broadcasted a charset, it's telling us that it had encoded that data with that character set.  We shouldn't care if the data inside the onion is text, audio, application-specific, some proprietary format, whatever.

Please remove this 'if' line.  It's a pretty intelligent interface, so it would be a waste of code to have other folks design their own decoding interface just because of this restriction.

As described in a lengthy comment on Always charset decode, regardless of content type, application/json none of the examples mentioned here above should have any charset-decoding done automagically.

As recommended in RFC 6657 – MIME Charset Default Update §3b New Rules for Default "charset" Parameter Values for "text/*" Media Types, text/* SHOULD

require explicit unconditional inclusion of the "charset" parameter, eliminating the need for a default value.

And thus Content-Type MUST matter and the charset parameter for any other media type is just 'auxiliary meta info' for a processor of the 'binary' data like application/*. It is not the other way around , a charset param DOES NOT mean that it is of type type/*.

Saying «It's a pretty intelligent interface» is a great compliment, but I prefer an interface to do less guess work and do less automagically things under the hood. Especially regarding encoding/decoding layers. We all know it goes wrong too many times.

Sure, I'd like to have an intelligent interface, but not on the cost to make 'semi-core' Perl modules going to do the wrong thing. Maybe a HTTP::Message::MIME::ApplicationJSON © would be a better alternative ?

Conclusion

$mess->decoded_contnent is doing the right thing

this issue SHOULD be closed

Looks like https://metacpan.org/release/LWP-JSON-Tiny might meet your criteria (haven't tried it myself though), for anyone else coming across this.

or a naughty over-ride HTTP::Message::JSON

Yes, that's from the same package :)

as described in #99, unless we add extra parameters or an entire new method, this can not be solved without breaking any existing API based on JSON

Closing this as we're working towards a solution in #99