Inconsistency and `undefined` return in `inflate` method between versions v1 and v2
microshine opened this issue · comments
Hello pako
team,
I have encountered a couple of issues with the inflate
method when working with different versions of the pako
library.
Environment:
- Node.js version: v20.10.0
- Pako version 1 (alias
pako_v1
):1.0.11
- Pako version 2 (alias
pako_v2
):2.1.0
Steps to Reproduce:
- Install two versions of
pako
using the following commands:npm install pako_v1@npm:pako@^1.0.0 npm install pako_v2@npm:pako@^2.0.0
- Run the following script:
const pako1 = require('pako_v1'); const pako2 = require('pako_v2'); function main() { const data = Buffer.from("789c3d4e4b0ac2400cdd07728739c1f465da693b2005c522ba2bcc4e5c886029584aab0b8f6f3a8a0492bcbc0fe19969261fbc7500c43903adba105b56210457257c1b293b8e30fb893a22ee9824dde5c732216d4bcf94b5efd701a67f32eda2c2ed325c1faa8c7775390bd5fddb17fbb258472d623d4cd4b4f3ca6e805c9fcad15c4c3c31b59af601d2ca22900a", "hex"); const result1 = pako1.inflate(data); const result2 = pako2.inflate(data); console.log(Buffer.from(result1).toString()); if (result2 === undefined) { console.log("pako2.inflate returns undefined"); } else { console.log(Buffer.from(result2).toString()); } } main();
Expected Behavior:
The inflate
method should return a Uint8Array
or throw an error if the inflation is unsuccessful.
Observed Behavior:
- In
pako
version 2 (pako_v2
), theinflate
method returnsundefined
instead of aUint8Array
or an error. - There seems to be a discrepancy in how extra bytes at the end of the input data are handled between versions. Version 1 (
pako_v1
) correctly discards the extra byte0x0A
at the end of the data, while version 2 (pako_v2
) does not.
Additional Context:
The input data buffer is extracted from a PDF document's stream object. Some PDF documents may incorrectly specify the Length
for stream objects and may also set an incorrect EOL character before endstream
, resulting in binary data with extra bytes (such as 0x0A
or 0x0D
), as in this case.
This behavior is problematic because it affects the ability to process certain PDF streams, which might have incorrectly reported lengths or have an additional EOL character due to incorrect PDF generation.
Could you please look into these issues? The handling of the end bytes is crucial for my use case, where I process PDF files that may not always be correctly formed.
Thank you for your assistance!