nodeca / pako

high speed zlib port to javascript, works in browser & node.js

Home Page:http://nodeca.github.io/pako/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inflate to:"string" conversion from Uint8Array to UTF-16 string performance issues

jxu opened this issue · comments

commented

Same issue reported here https://stackoverflow.com/questions/38145228/convertation-from-uint8array-to-utf-16-string-freezes-crashes-browser

To reproduce: try inflating a 100 MB gz file with option to: "string"

The solution is probably to do conversion in chunks

Please provide a minimal code sample how to reproduce.

Also, 100mb gzip of text is ~ 1GB string after unpack. Probably you are out of memory (JIT has some limits). You could use chunking and save result to blob, it's intended to work with such big data.

commented

I should clarify, 100 MB uncompressed, 35 MB compressed. Idk if 100 MB will fit into JIT.

Here is the file I was working with https://github.com/jxu/Word2VecDemo/raw/6ee9741a5fd556b3aa6d4598d4881061588f3c9e/wordvecs50k.vec.gz

Here is some example code that runs in an async function, only due to fetch (replace with sync file load if you'd like). Nothing special is happening here

    const vecsResponse = await fetch("wordvecs50k.vec.gz");
    const vecsBlob = await vecsResponse.blob();
    const vecsBuf = await vecsBlob.arrayBuffer();
    const vecsUint8 = pako.inflate(vecsBuf, {to: "string"});

My workaround was to use TextDecoder(). Is there any reason the library doesn't use TextDecoder? The library already assumes modern browser support

Could you narrow down your example? Is it specific for that file or you can generate long string, then deflate and inflate back?

commented

It applies to any large file. Here is an example that does not use any specific file:

s = [...Array(10**7)].map(() => Math.random().toString().substring(0,10)).join('') // generate 100M characters of fake floating point data
d = pako.deflate(s); // ~45M, takes a while but not a problem since I am not using it in my code
pako.inflate(d); // runs quickly
pako.inflate(d, {to:"string"}) // freezes browser for 30s
new TextDecoder().decode(pako.inflate(d)) // much faster

Thanks for simplified example.

// freezes browser for 30s

So, problem is only with slower decoding speed? I could add TextDecoder call in buf2string() when available. This code was written before TD become stable.

commented

I've changed code to use TextDecoder whenever possible, thanks for reporting.