TooTallNate / node-cgi

An http/stack/connect layer to invoke and serve CGI executables.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error on parsing binary (png) replies

bigmonkeyboy opened this issue · comments

Hi if is use cgi to call a mapserver... mapserv
It should return the tiles as png files... - but this seems to cause the parser to error out..
Here is the "line" returned (well just the first part)

Content-type: image/png

�PNG ...
LINE �
IHDR�����h��PLTEu�[�77>8(�ؘ�Ȉɼf�������֪w�G���~��G��!u�����u���V���[�����ۏ�噙�kk�tt�}}҆��{Z����y��G��P��Y��b��k��t��}��`��rȽ{�Ƅ����ևe����������|�B�:=K�>��tRNS@��f IDATx��]i�#����p�����

and the error from.... /node_modules/cgi/node_modules/header-stack/parser.js:110:33)
ParseError: Malformed header line, no delimiter (:) found: "����W������
�Vq�pn����"

For completeness - here is the start of the "chunk" -
<Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 01 00 00 00 01 00 08 03 ...>
so you can see the 0a 0a end of header...

You must write some HTTP headers to the response first, including a Status header indicating the HTTP status code to return. Something like this should work:

#!/bin/sh

# HTTP headers
echo "Status: 200"
echo "Content-Type: image/png"
echo

# HTTP response body (do your mapserv invocation here)
mapserv

hmmm - It already sends the content-type and blank line... so there is A header (admitedly it doesn't include the status... is that mandatory ?

If I poke in the code it seems to parse the header "ok" - but then tries to parse the message as if it was also a header and that is when it borks as it is a binary png file.

I tried like you suggest - but mapserv is a bit picky and doesn't like to be called via a script...
"
This script can only be used to decode form results and should be initiated as a CGI process via a httpd server.
"
so I guess there is something that is not being passed in or out.
(However I do think this is a distraction and the header parsing of the payload is the main problem)

Oh I think I see. mapserv sends back the entire HTTP response. In that case, try passing in { nph: true } to let the script be in charge of the HTTP response headers.

Hi - tried that ... still not happy - but digging some more... in the headerstack parser.js lines 57 58
var eol = buf.indexOf(Parser.CRLF);
var delimLength = Parser.CRLF.length;

The response uses just \n to separate the headers from the body... - but because the binary png happens to contain \r\n the lines above "succeed" - (incorrectly as they now contain the header plus some of the image) - and then the parse fails etc etc... If I set Parser.CRLF to be "\n" (in this instance) it works just fine... maybe you can set that from the environment ?
line 42 - Parser.CRLF = require('os').EOL;
seems to work for me on Linux...

If you could give me a raw dump of the output from your program (at least, the header and the first few bytes of the response body), then that would help write a test case for node-header-stack.

Hi - the first chunk you are parsing (in Header parser.js) is

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 01 00 00 00 01 00 08 03 ...

the header should end here

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 

ie two line feeds in my case

but you are parsing it to break here

Buffer 43 6f 6e 74 65 6e 74 2d 74 79 70 65 3a 20 69 6d 61 67 65 2f 70 6e 67 0a 0a 89 50 4e 47 0d 0a 

as it finds a 0d0a - but 89 50 4e 47 0d 0a is part of the PNG image - see file spec here https://en.wikipedia.org/wiki/Portable_Network_Graphics - you then try to parse that and of course the final bit doesn't conform to an http header and it all barfs...

Do you really need to look of 0D0A - why not just 0A and then strip the 0D if you want ?

Do you really need to look of 0D0A - why not just 0A and then strip the 0D if you want ?

The thing is, I'm using the strictCRLF: false option, which should make the parser be lenient on that note, so I'm just a little confused at this point.

What is your server code and what is the HTTP request that you're sending to the server?

Hi - the actual server I am using is the cgi-mapserver on Ubuntu ( sudo apt-get install cgi-mapserver )
I am requesting a map page and the browser side breaks this into multiple .png files (tiles) requests.

The problem is around lines 56... in parser.js

  var eol = buf.indexOf(Parser.CRLF);
  var delimLength = Parser.CRLF.length;
  if (eol === -1) {
    eol = buf.indexOf(Parser.LF);

you do the search for CRLF (0d0a) first... - and a .PNG file DOES have that in it (as part of it's own internal header) - so eol is not -1 and you proceed incorrectly as you are now taking the wrong delimLength - as the actual end of header was the 0a several bytes earlier (see examples above). If you want to do a less strict search then you need to always do both searches and then determine which is most correct - but in this case just searching of 0a (to be less strict) and then adding a check for the 0d to be more strict - would (imho) be easier.

Well as I pointed out before, the parser is already setup to be less strict about CRLF. i.e. just a LF should be fine.

In fact, the test/cgi-bin/printenv.cgi script simply is a bash script that uses echo to output the headers, so even that sends only 0a0a to end the header, and that example works fine. So I think you're pointing out a red herring.

If you could give me the node server code you're using, and how you are hitting that server (i.e. curl, web browser, what URL?)? I can't really help if I can't reproduce the issue myself.

Ok - is easier to modify your examples...
hello.cgi

#!/usr/bin/perl -w
my $file = "logo.png";
my $length = -s $file;
print "Content-type: image/png\n";
print "Content-length: $length \n\n";
binmode STDOUT;
open (FH,'<', $file) || die "Could not open $file: $!";
my $buffer = "";
while (read(FH, $buffer, 10240)) {
    print $buffer;
}

and drop the attached png file (or any other) into that directory and rename it logo.png..
logo
then dies horribly

node hello.js 
server listening
http.createClient is deprecated. Use `http.request` instead.
events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: ParseError: Malformed header line, no delimiter (:) found: "�"
    at Parser.parseHeaderLine [as _parseHeaderLine] (/home/pc/node/node_modules/cgi/node_modules/header-stack/parser.js:124:33)

It seems to work for me. What am I missing in your setup? https://gist.github.com/TooTallNate/3558a59c608fab26bac0

I have no idea... I'm serving from 32Bit Linux (Ubuntu 12.04) - but...
all I know is that changing the search to be for just 0A (to match the OS expectation) works...

Well it's possible that there's a bug in header-parser, but I just can't reproduce it...

well the bug is as I said... it looks for crlf first... and finds one - (in the png) - so sets eol wrong - so does the slice wrong... but why it working for you is even more confusing... especially when your default examples work - ie returning text that also has crlf in...
I'm using node 0.10.33. I know are busy changing the way they handle buffers/binary data etc...

I'm going to propose this for the parser - add defined CR char. Then do the check in the other order - i.e. check single first - then recheck for the CR if strict is true

Parser.CR = new Buffer('\r');
Parser.prototype._onData = function onData(chunk) {
  if (chunk) this._buffers.push(chunk);
  var buf = this._buffers.take();
  var eol = buf.indexOf(Parser.LF);
  var delimLength = Parser.LF.length;

  if (eol !== -1) {
    if ((this.options.strictCRLF) && (buf[eol-1] !== Parser.CR)) {
        return this.emit('error', new Error('ParseError: Found a lone \'\\n\' char, and `strictCRLF` is true'));
    } else {
        var slice = buf.slice(0, eol);
        this._buffers.advance(eol+delimLength);
        this._parseHeaderLine(slice.toString().trim()); // trim any trailing white-space
        if (this._buffers.length > 0) {
          this._onData();
        }
    }
  } else {
    //console.error("waiting for the next 'data' event");
  }
}

Hello,

I am having the exact same issue wit .gif images. It appears to be a race condition.

I can download the images one at a time with a pause between each request and it works correctly.

If my browser makes several simultaneous requests then it dies with the exact same error reported by bigmonkeyboy

I do not have enough skill to determine if the problem
Is with the cgi library or some other dependency in the chain.

I would be willing to grant ssh to my test machine so you could verify the problem.

Thanks for any help.

I implemented the following change recommended by bigmonkeyboy

Before:

var eol = buf.indexOf(Parser.CRLF);
var delimLength = Parser.CRLF.length;

After:

var eol = buf.indexOf(Parser.LF);
var delimLength = Parser.LF.length;

And now the problem goes away. This is less than satisfying because the cgi script is returning properly formatted headers with CRLF after each line. So it works now. But, I am concerned that the actual bug hasn't been found.

Hi,

I have created:

var script = path.resolve(SDK_ROOT + "/bin/ms/apps", "mapserv.exe");
var cgiObj = cgi(script);
var server = http.createServer(cgiObj).listen(8000);

which works great
but I need to set 'Access-Control-Allow-Origin'
how would I go about setting that?

I'm surprised this bug hasn't been solved yet... Yes, as @bigmonkeyboy says, the header parser will first look for CRLF... if it finds it, it'll use this as a delimiter. This is horrible.

Simple CGI script that triggers the crash...

#!/usr/bin/env python
print('Header1: Value1')
print('Header2: Value2')
print('')
print('line1')
print('line2', end='\r\n')
print('line3')
Error: ParseError: Malformed header line, no delimiter (:) found: "line3"