libwww-perl / HTTP-Message

The HTTP-Message distribution contains classes useful for representing the messages passed in HTTP style communication.

Home Page:https://metacpan.org/pod/HTTP::Message

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

decoded_content different depending on Content-Type header

davewood opened this issue · comments

11:54 < davewood> HTTP::Message returns different content depending on Content-Type header. ... http://paste.scsys.co.uk/595573
11:54 < davewood> or is it just me misunderstanding the content-type header behaviour?
12:06 < castaway> that does seem a tad strange
david@nbdt:~/dev/zid/EPPlication$ cat foo.pl 
#!/usr/bin/env perl
use 5.018;
use warnings;
use HTTP::Response;
use Encode qw/ encode_utf8 /;
use utf8;

my $umlaut = 'ä';
my $umlaut_octets = encode_utf8($umlaut);

my $res1 = HTTP::Response->new(
  200,
  '',
  ['content-type' => 'text/csv; charset=utf-8'],
  $umlaut_octets,
);
my $res2 = HTTP::Response->new(
  200,
  '',
  ['content-type' => 'application/json; charset=utf-8'],
  $umlaut_octets,
);

die   $res1->content_charset . '/'
    . $res1->content . '/'
    . $res1->decoded_content
    . "\n"
    . $res2->content_charset . '/'
    . $res2->content . '/'
    . $res2->decoded_content . "\n";

david@nbdt:~/dev/zid/EPPlication$ perl foo.pl 
UTF-8/ä/�
UTF-8/ä/ä

content_charset returns 'UTF-8' for both request objects

Could [1] in combination with [2] be the problem?

[1]
CAVEAT: the input $octets might be modified in-place depending on what is set in CHECK. See "LEAVE_SRC" if you want your inputs to be left unchanged. (https://metacpan.org/pod/Encode#encode_utf8)

[2]
https://metacpan.org/dist/HTTP-Message/source/lib/HTTP/Message.pm#L269

commented

It shouldnt get to line 269, as 209 should have returned already (as you confirmed content_type_charset was set+correct.. it must be in the decoded_content method somewhere..

you are correct.

UTF-8 at /home/david/perl5/perlbrew/perls/perl-5.22.2/lib/site_perl/5.22.2/HTTP/Message.pm line 209.

12:43 < davewood> could this explain my issue: "Returns the content with any Content-Encoding undone and for textual content the raw content encoded to Perl's Unicode strings."

https://metacpan.org/pod/HTTP::Message#$mess-%3Edecoded_content(-%options-)

this works for me:

decoded_content(charset => 'none')

cheers