Inconsistency and `undefined` return in `inflate` method between versions v1 and v2

Question

Inconsistency and `undefined` return in `inflate` method between versions v1 and v2

microshine opened this issue 4 months ago · comments

Hello pako team,

I have encountered a couple of issues with the inflate method when working with different versions of the pako library.

Environment:

Node.js version: v20.10.0
Pako version 1 (alias pako_v1): 1.0.11
Pako version 2 (alias pako_v2): 2.1.0

Steps to Reproduce:

Install two versions of pako using the following commands:

npm install pako_v1@npm:pako@^1.0.0
npm install pako_v2@npm:pako@^2.0.0

Run the following script:

const pako1 = require('pako_v1');
const pako2 = require('pako_v2');

function main() {
  const data = Buffer.from("789c3d4e4b0ac2400cdd07728739c1f465da693b2005c522ba2bcc4e5c886029584aab0b8f6f3a8a0492bcbc0fe19969261fbc7500c43903adba105b56210457257c1b293b8e30fb893a22ee9824dde5c732216d4bcf94b5efd701a67f32eda2c2ed325c1faa8c7775390bd5fddb17fbb258472d623d4cd4b4f3ca6e805c9fcad15c4c3c31b59af601d2ca22900a", "hex");

  const result1 = pako1.inflate(data);
  const result2 = pako2.inflate(data);

  console.log(Buffer.from(result1).toString());
  if (result2 === undefined) {
    console.log("pako2.inflate returns undefined");
  } else {
    console.log(Buffer.from(result2).toString());
  }
}

main();

Expected Behavior:

The inflate method should return a Uint8Array or throw an error if the inflation is unsuccessful.

Observed Behavior:

In pako version 2 (pako_v2), the inflate method returns undefined instead of a Uint8Array or an error.
There seems to be a discrepancy in how extra bytes at the end of the input data are handled between versions. Version 1 (pako_v1) correctly discards the extra byte 0x0A at the end of the data, while version 2 (pako_v2) does not.

Additional Context:

The input data buffer is extracted from a PDF document's stream object. Some PDF documents may incorrectly specify the Length for stream objects and may also set an incorrect EOL character before endstream, resulting in binary data with extra bytes (such as 0x0A or 0x0D), as in this case.

This behavior is problematic because it affects the ability to process certain PDF streams, which might have incorrectly reported lengths or have an additional EOL character due to incorrect PDF generation.

Could you please look into these issues? The handling of the end bytes is crucial for my use case, where I process PDF files that may not always be correctly formed.

Thank you for your assistance!