Error on parsing binary (png) replies

Question

Error on parsing binary (png) replies

bigmonkeyboy opened this issue 10 years ago · comments

Hi if is use cgi to call a mapserver... mapserv
It should return the tiles as png files... - but this seems to cause the parser to error out..
Here is the "line" returned (well just the first part)

Content-type: image/png

�PNG ...
LINE �
IHDR��h��PLTEu�[�77>8(�ؘ�Ȉɼf��֪w�G��~��G��!u��u��V��[��ۏ�噙�kk�tt�}}҆��{Z��y��G��P��Y��b��k��t��}��`��rȽ{�Ƅ��ևe��|�B�:=K�>��tRNS@��f IDATx��]i�#��p��

and the error from.... /node_modules/cgi/node_modules/header-stack/parser.js:110:33)
ParseError: Malformed header line, no delimiter (:) found: "��W��
�Vq�pn��"

For completeness - here is the start of the "chunk" -
<Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 01 00 00 00 01 00 08 03 ...>
so you can see the 0a 0a end of header...

Nathan Rajlich · Answer 1 · Sun Jul 20 2014 09:04:15 GMT+0800 (China Standard Time)

You must write some HTTP headers to the response first, including a Status header indicating the HTTP status code to return. Something like this should work:

#!/bin/sh

# HTTP headers
echo "Status: 200"
echo "Content-Type: image/png"
echo

# HTTP response body (do your mapserv invocation here)
mapserv

bigmonkeyboy · Answer 2 · Tue Jul 22 2014 22:19:42 GMT+0800 (China Standard Time)

hmmm - It already sends the content-type and blank line... so there is A header (admitedly it doesn't include the status... is that mandatory ?

If I poke in the code it seems to parse the header "ok" - but then tries to parse the message as if it was also a header and that is when it borks as it is a binary png file.

I tried like you suggest - but mapserv is a bit picky and doesn't like to be called via a script...
"
This script can only be used to decode form results and should be initiated as a CGI process via a httpd server.
"
so I guess there is something that is not being passed in or out.
(However I do think this is a distraction and the header parsing of the payload is the main problem)

Nathan Rajlich · Answer 3 · Tue Jul 22 2014 22:42:04 GMT+0800 (China Standard Time)

Oh I think I see. mapserv sends back the entire HTTP response. In that case, try passing in { nph: true } to let the script be in charge of the HTTP response headers.

bigmonkeyboy · Answer 4 · Tue Jul 22 2014 22:54:16 GMT+0800 (China Standard Time)

Hi - tried that ... still not happy - but digging some more... in the headerstack parser.js lines 57 58
var eol = buf.indexOf(Parser.CRLF);
var delimLength = Parser.CRLF.length;

The response uses just \n to separate the headers from the body... - but because the binary png happens to contain \r\n the lines above "succeed" - (incorrectly as they now contain the header plus some of the image) - and then the parse fails etc etc... If I set Parser.CRLF to be "\n" (in this instance) it works just fine... maybe you can set that from the environment ?
line 42 - Parser.CRLF = require('os').EOL;
seems to work for me on Linux...

Nathan Rajlich · Answer 5 · Wed Jul 23 2014 01:13:25 GMT+0800 (China Standard Time)

If you could give me a raw dump of the output from your program (at least, the header and the first few bytes of the response body), then that would help write a test case for node-header-stack.

bigmonkeyboy · Answer 6 · Wed Jul 23 2014 04:01:17 GMT+0800 (China Standard Time)

Hi - the first chunk you are parsing (in Header parser.js) is

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 01 00 00 00 01 00 08 03 ...

the header should end here

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a

ie two line feeds in my case

but you are parsing it to break here

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a

as it finds a 0d0a - but 89 50 4e 47 0d 0a is part of the PNG image - see file spec here https://en.wikipedia.org/wiki/Portable_Network_Graphics - you then try to parse that and of course the final bit doesn't conform to an http header and it all barfs...

Do you really need to look of 0D0A - why not just 0A and then strip the 0D if you want ?

Nathan Rajlich · Answer 7 · Fri Jan 23 2015 10:49:49 GMT+0800 (China Standard Time)

Do you really need to look of 0D0A - why not just 0A and then strip the 0D if you want ?

The thing is, I'm using the strictCRLF: false option, which should make the parser be lenient on that note, so I'm just a little confused at this point.

Nathan Rajlich · Answer 8 · Fri Jan 23 2015 14:00:32 GMT+0800 (China Standard Time)

What is your server code and what is the HTTP request that you're sending to the server?

bigmonkeyboy · Answer 9 · Fri Jan 23 2015 17:58:50 GMT+0800 (China Standard Time)

Hi - the actual server I am using is the cgi-mapserver on Ubuntu ( sudo apt-get install cgi-mapserver )
I am requesting a map page and the browser side breaks this into multiple .png files (tiles) requests.

The problem is around lines 56... in parser.js

  var eol = buf.indexOf(Parser.CRLF);
  var delimLength = Parser.CRLF.length;
  if (eol === -1) {
    eol = buf.indexOf(Parser.LF);

you do the search for CRLF (0d0a) first... - and a .PNG file DOES have that in it (as part of it's own internal header) - so eol is not -1 and you proceed incorrectly as you are now taking the wrong delimLength - as the actual end of header was the 0a several bytes earlier (see examples above). If you want to do a less strict search then you need to always do both searches and then determine which is most correct - but in this case just searching of 0a (to be less strict) and then adding a check for the 0d to be more strict - would (imho) be easier.

Nathan Rajlich · Answer 10 · Sat Jan 24 2015 00:42:39 GMT+0800 (China Standard Time)

Well as I pointed out before, the parser is already setup to be less strict about CRLF. i.e. just a LF should be fine.

In fact, the test/cgi-bin/printenv.cgi script simply is a bash script that uses echo to output the headers, so even that sends only 0a0a to end the header, and that example works fine. So I think you're pointing out a red herring.

If you could give me the node server code you're using, and how you are hitting that server (i.e. curl, web browser, what URL?)? I can't really help if I can't reproduce the issue myself.

bigmonkeyboy · Answer 11 · Sat Jan 24 2015 01:23:12 GMT+0800 (China Standard Time)

Ok - is easier to modify your examples...
hello.cgi

#!/usr/bin/perl -w
my $file = "logo.png";
my $length = -s $file;
print "Content-type: image/png\n";
print "Content-length: $length \n\n";
binmode STDOUT;
open (FH,'<', $file) || die "Could not open $file: $!";
my $buffer = "";
while (read(FH, $buffer, 10240)) {
    print $buffer;
}

and drop the attached png file (or any other) into that directory and rename it logo.png..

then dies horribly

node hello.js 
server listening
http.createClient is deprecated. Use `http.request` instead.

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ParseError: Malformed header line, no delimiter (:) found: "�"
    at Parser.parseHeaderLine [as _parseHeaderLine] (/home/pc/node/node_modules/cgi/node_modules/header-stack/parser.js:124:33)

Nathan Rajlich · Answer 12 · Sat Jan 24 2015 01:30:03 GMT+0800 (China Standard Time)

It seems to work for me. What am I missing in your setup? https://gist.github.com/TooTallNate/3558a59c608fab26bac0

bigmonkeyboy · Answer 13 · Sat Jan 24 2015 02:15:27 GMT+0800 (China Standard Time)

I have no idea... I'm serving from 32Bit Linux (Ubuntu 12.04) - but...
all I know is that changing the search to be for just 0A (to match the OS expectation) works...

Nathan Rajlich · Answer 14 · Sat Jan 24 2015 02:16:59 GMT+0800 (China Standard Time)

Well it's possible that there's a bug in header-parser, but I just can't reproduce it...

bigmonkeyboy · Answer 15 · Sat Jan 24 2015 03:11:18 GMT+0800 (China Standard Time)

well the bug is as I said... it looks for crlf first... and finds one - (in the png) - so sets eol wrong - so does the slice wrong... but why it working for you is even more confusing... especially when your default examples work - ie returning text that also has crlf in...
I'm using node 0.10.33. I know are busy changing the way they handle buffers/binary data etc...

bigmonkeyboy · Answer 16 · Sat Jan 24 2015 20:30:09 GMT+0800 (China Standard Time)

I'm going to propose this for the parser - add defined CR char. Then do the check in the other order - i.e. check single first - then recheck for the CR if strict is true

Parser.CR = new Buffer('\r');
Parser.prototype._onData = function onData(chunk) {
  if (chunk) this._buffers.push(chunk);
  var buf = this._buffers.take();
  var eol = buf.indexOf(Parser.LF);
  var delimLength = Parser.LF.length;

  if (eol !== -1) {
    if ((this.options.strictCRLF) && (buf[eol-1] !== Parser.CR)) {
        return this.emit('error', new Error('ParseError: Found a lone \'\\n\' char, and `strictCRLF` is true'));
    } else {
        var slice = buf.slice(0, eol);
        this._buffers.advance(eol+delimLength);
        this._parseHeaderLine(slice.toString().trim()); // trim any trailing white-space
        if (this._buffers.length > 0) {
          this._onData();
        }
    }
  } else {
    //console.error("waiting for the next 'data' event");
  }
}

Terry Riegel · Answer 17 · Thu May 05 2016 03:51:11 GMT+0800 (China Standard Time)

Hello,

I am having the exact same issue wit .gif images. It appears to be a race condition.

I can download the images one at a time with a pause between each request and it works correctly.

If my browser makes several simultaneous requests then it dies with the exact same error reported by bigmonkeyboy

I do not have enough skill to determine if the problem
Is with the cgi library or some other dependency in the chain.

I would be willing to grant ssh to my test machine so you could verify the problem.

Thanks for any help.

Terry Riegel · Answer 18 · Thu May 05 2016 20:56:49 GMT+0800 (China Standard Time)

I implemented the following change recommended by bigmonkeyboy

Before:

var eol = buf.indexOf(Parser.CRLF);
var delimLength = Parser.CRLF.length;

After:

var eol = buf.indexOf(Parser.LF);
var delimLength = Parser.LF.length;

And now the problem goes away. This is less than satisfying because the cgi script is returning properly formatted headers with CRLF after each line. So it works now. But, I am concerned that the actual bug hasn't been found.

Marco Giana · Answer 19 · Fri Jun 16 2017 09:04:05 GMT+0800 (China Standard Time)

Hi,

I have created:

var script = path.resolve(SDK_ROOT + "/bin/ms/apps", "mapserv.exe");
var cgiObj = cgi(script);
var server = http.createServer(cgiObj).listen(8000);

which works great
but I need to set 'Access-Control-Allow-Origin'
how would I go about setting that?

Alba Mendez · Answer 20 · Fri Dec 27 2019 00:06:52 GMT+0800 (China Standard Time)

I'm surprised this bug hasn't been solved yet... Yes, as @bigmonkeyboy says, the header parser will first look for CRLF... if it finds it, it'll use this as a delimiter. This is horrible.

Simple CGI script that triggers the crash...

#!/usr/bin/env python
print('Header1: Value1')
print('Header2: Value2')
print('')
print('line1')
print('line2', end='\r\n')
print('line3')

Error: ParseError: Malformed header line, no delimiter (:) found: "line3"