Deflate stream produces way bigger outputs than Pako

Question

Deflate stream produces way bigger outputs than Pako

BenoitZugmeyer opened this issue a year ago · comments

How to reproduce

import {Deflate as PakoDeflate, inflateRaw} from "https://unpkg.com/pako@2.1.0/dist/pako.esm.mjs"
import {Deflate as FflateDeflate} from "https://unpkg.com/fflate@0.7.4/esm/browser.js"

const data = new TextEncoder().encode(Array.from({ length: 1000 }, (_, i) => i).join(","))

const fflateChunks = []
const fflateDeflate = new FflateDeflate(chunk => {
  fflateChunks.push(chunk)
})
fflateDeflate.push(data)
fflateDeflate.push(data)
fflateDeflate.push(data, true)
const fflateResult = new Uint8Array(fflateChunks.reduce((total, chunk) => total + chunk.byteLength, 0))
{
  let offset = 0;
  for (const chunk of fflateChunks) {
    fflateResult.set(chunk, offset);
    offset += chunk.byteLength;
  }
}


const pakoDeflate = new PakoDeflate({ raw: true })
pakoDeflate.push(data)
pakoDeflate.push(data)
pakoDeflate.push(data, true)
const pakoResult = pakoDeflate.result


// sanity check
if (inflateRaw(pakoResult).byteLength !== inflateRaw(fflateResult).byteLength) {
  throw new Error("Results don't match")
}


console.log("Pako result length:  ", pakoResult.byteLength)
console.log("Fflate result length:", fflateResult.byteLength)

// Pako result length:   1916
// Fflate result length: 5354

The problem

fflate produces a much bigger output than pako. Contrary to pako, fflate does not share the deflate state for the whole stream, so previously pushed chunk aren't taken into account when pushing a new chunk.

101arrowz · Answer 1 · Thu Apr 27 2023 23:41:21 GMT+0800 (China Standard Time)

This design was chosen to make every pushed chunk correspond to one or more chunks in the deflate stream. But as you mentioned, it's inefficient for several small chunks. It's still relatively effective for chunks of around 1MB in size (e.g. the ones returned from File.prototype.stream, but upon reconsideration this use case is probably important to support as well.

This might be possible to resolve by preserving a 32kB lookback buffer - I'll see how difficult it is to implement.

101arrowz · Answer 2 · Thu May 18 2023 08:46:48 GMT+0800 (China Standard Time)

I've successfully implemented this and will push it out in a release sometime soon.

101arrowz · Answer 3 · Mon May 22 2023 09:55:31 GMT+0800 (China Standard Time)

Fixed in v0.8.0.