libwww-perl / HTTP-Message

The HTTP-Message distribution contains classes useful for representing the messages passed in HTTP style communication.

Home Page:https://metacpan.org/pod/HTTP::Message

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Message affected by 'use utf8', breaks binary POSTs [rt.cpan.org #77403]

oalders opened this issue · comments

Migrated from rt.cpan.org#77403 (status was 'open')

Requestors:

Attachments:

From henrik.pauli@gmail.com on 2012-05-24 13:30:09:

It appeared to us that POSTing binary data with LWP corrupted the data
when (and only when) we had �use utf8� enabled in the script using LWP.

This bug was present in LWP 5.833 as well as the newest HTTP::Message 6.03.

�use utf8� doesn't do anything but turn the strings in the source code
into string of characters, rather than octets -- it seems that
HTTP::Request::Common is completely encoding (and u-string) agnostic,
which is VERY dangerous in a place where you manipulate octet streams.

The source of the problem is that you have strings in the source code
(eg. where you add the Content-Disposition header[1]), and *also* read
bytes from the file into the same buffer later on[2].  One is easily a
character string, the other is definitely an octet stream.

Not sure what the right solution is, but the module should safeguard
itself against these kinds of things.

[1]
https://metacpan.org/source/GAAS/HTTP-Message-6.03/lib/HTTP/Request/Common.pm#L135
[2]
https://metacpan.org/source/GAAS/HTTP-Message-6.03/lib/HTTP/Request/Common.pm#L243

P.S. Might be a similar issue, we also recently noticed that https and
use utf8 breaks a HTTP request, either or both of them missing doesn't.

PPS. Perl 5.10.1, Linux 3.1 x86.

From gaas@cpan.org on 2012-05-27 11:48:49:

It would be helpful if you can provide a small test script that demonstrates
the problem.

From gortan@cpan.org on 2015-05-13 15:51:32:

On Sun May 27 07:48:49 2012, GAAS wrote:
> It would be helpful if you can provide a small test script that demonstrates the problem.

I think I just ran into the same issue, and tried to come up with two minimal scripts: Both have a constant value 'öööö' in their source code, which they both pass on to HTTP::Request::Common::POST to print them as application/x-www-form-urlencoded. One of the scripts is saved as latin-1, the other is saved as utf-8 and has "use utf8" set.
I would assume that the output of both scripts is identical. However, while the latin1 script produces the expected:
text=%F6%F6%F6%F6%F6%F6%F6%F6%F6%F6%F6
the utf8 script (imho incorrectly) produces:
text=%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6%C3%B6

$HTTP::Request::Common::VERSION is 6.04, perl v5.20.2 built for x86_64-linux.
commented

Hi,
I guess this issue is similar to http://matrix.cpantesters.org/?dist=WebService-KoreanSpeller+0.014
Only at or below perl 5.10.1 get test fail.

I tested it myself.
I changed url at https://metacpan.org/source/AERO/WebService-KoreanSpeller-0.014/lib/WebService/KoreanSpeller.pm#L25 to localhost.
and got the raw request through nc.

Why does LWP POST send different request with the same version LWP related modules ?

Case A: Perl 5.10.1 , LWP 6.26, HTTP::Request 6.11
Case B: Perl 5.20.3 , LWP 6.26, HTTP::Request 6.11

Case A

POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:88888
User-Agent: libwww-perl/6.26
Content-Length: 96
Content-Type: application/x-www-form-urlencoded

text1=%C3%AC%C2%95%C2%88%C3%AB%C2%87%C2%BD%C3%AD%C2%95%C2%98%C3%AC%C2%84%C2%B8%C3%AC%C2%9A%C2%94

Case B

POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: localhost:88888
User-Agent: libwww-perl/6.26
Content-Length: 51
Content-Type: application/x-www-form-urlencoded

text1=%EC%95%88%EB%87%BD%ED%95%98%EC%84%B8%EC%9A%94