Message affected by 'use utf8', breaks binary POSTs [rt.cpan.org #77403]
oalders opened this issue · comments
Migrated from rt.cpan.org#77403 (status was 'open')
Requestors:
Attachments:
From henrik.pauli@gmail.com on 2012-05-24 13:30:09:
It appeared to us that POSTing binary data with LWP corrupted the data
when (and only when) we had �use utf8� enabled in the script using LWP.
This bug was present in LWP 5.833 as well as the newest HTTP::Message 6.03.
�use utf8� doesn't do anything but turn the strings in the source code
into string of characters, rather than octets -- it seems that
HTTP::Request::Common is completely encoding (and u-string) agnostic,
which is VERY dangerous in a place where you manipulate octet streams.
The source of the problem is that you have strings in the source code
(eg. where you add the Content-Disposition header[1]), and *also* read
bytes from the file into the same buffer later on[2]. One is easily a
character string, the other is definitely an octet stream.
Not sure what the right solution is, but the module should safeguard
itself against these kinds of things.
[1]
https://metacpan.org/source/GAAS/HTTP-Message-6.03/lib/HTTP/Request/Common.pm#L135
[2]
https://metacpan.org/source/GAAS/HTTP-Message-6.03/lib/HTTP/Request/Common.pm#L243
P.S. Might be a similar issue, we also recently noticed that https and
use utf8 breaks a HTTP request, either or both of them missing doesn't.
PPS. Perl 5.10.1, Linux 3.1 x86.
From gaas@cpan.org on 2012-05-27 11:48:49:
It would be helpful if you can provide a small test script that demonstrates
the problem.
From gortan@cpan.org on 2015-05-13 15:51:32:
On Sun May 27 07:48:49 2012, GAAS wrote:
> It would be helpful if you can provide a small test script that demonstrates the problem.
I think I just ran into the same issue, and tried to come up with two minimal scripts: Both have a constant value 'öööö' in their source code, which they both pass on to HTTP::Request::Common::POST to print them as application/x-www-form-urlencoded. One of the scripts is saved as latin-1, the other is saved as utf-8 and has "use utf8" set.
I would assume that the output of both scripts is identical. However, while the latin1 script produces the expected:
text=%F6%F6%F6%F6%F6%F6%F6%F6%F6%F6%F6
the utf8 script (imho incorrectly) produces:
text=%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6
$HTTP::Request::Common::VERSION is 6.04, perl v5.20.2 built for x86_64-linux.
Hi,
I guess this issue is similar to http://matrix.cpantesters.org/?dist=WebService-KoreanSpeller+0.014
Only at or below perl 5.10.1 get test fail.
I tested it myself.
I changed url at https://metacpan.org/source/AERO/WebService-KoreanSpeller-0.014/lib/WebService/KoreanSpeller.pm#L25 to localhost.
and got the raw request through nc.
Why does LWP POST send different request with the same version LWP related modules ?
Case A: Perl 5.10.1 , LWP 6.26, HTTP::Request 6.11
Case B: Perl 5.20.3 , LWP 6.26, HTTP::Request 6.11
Case A
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:88888
User-Agent: libwww-perl/6.26
Content-Length: 96
Content-Type: application/x-www-form-urlencoded
text1=%C3%AC%C2%95%C2%88%C3%AB%C2%87%C2%BD%C3%AD%C2%95%C2%98%C3%AC%C2%84%C2%B8%C3%AC%C2%9A%C2%94
Case B
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:88888
User-Agent: libwww-perl/6.26
Content-Length: 51
Content-Type: application/x-www-form-urlencoded
text1=%EC%95%88%EB%87%BD%ED%95%98%EC%84%B8%EC%9A%94