101arrowz / fflate

High performance (de)compression in an 8kB package

Home Page:https://101arrowz.github.io/fflate

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deflate stream produces way bigger outputs than Pako

BenoitZugmeyer opened this issue · comments

How to reproduce

import {Deflate as PakoDeflate, inflateRaw} from "https://unpkg.com/pako@2.1.0/dist/pako.esm.mjs"
import {Deflate as FflateDeflate} from "https://unpkg.com/fflate@0.7.4/esm/browser.js"

const data = new TextEncoder().encode(Array.from({ length: 1000 }, (_, i) => i).join(","))

const fflateChunks = []
const fflateDeflate = new FflateDeflate(chunk => {
  fflateChunks.push(chunk)
})
fflateDeflate.push(data)
fflateDeflate.push(data)
fflateDeflate.push(data, true)
const fflateResult = new Uint8Array(fflateChunks.reduce((total, chunk) => total + chunk.byteLength, 0))
{
  let offset = 0;
  for (const chunk of fflateChunks) {
    fflateResult.set(chunk, offset);
    offset += chunk.byteLength;
  }
}


const pakoDeflate = new PakoDeflate({ raw: true })
pakoDeflate.push(data)
pakoDeflate.push(data)
pakoDeflate.push(data, true)
const pakoResult = pakoDeflate.result


// sanity check
if (inflateRaw(pakoResult).byteLength !== inflateRaw(fflateResult).byteLength) {
  throw new Error("Results don't match")
}


console.log("Pako result length:  ", pakoResult.byteLength)
console.log("Fflate result length:", fflateResult.byteLength)

// Pako result length:   1916
// Fflate result length: 5354

The problem

fflate produces a much bigger output than pako. Contrary to pako, fflate does not share the deflate state for the whole stream, so previously pushed chunk aren't taken into account when pushing a new chunk.

This design was chosen to make every pushed chunk correspond to one or more chunks in the deflate stream. But as you mentioned, it's inefficient for several small chunks. It's still relatively effective for chunks of around 1MB in size (e.g. the ones returned from File.prototype.stream, but upon reconsideration this use case is probably important to support as well.

This might be possible to resolve by preserving a 32kB lookback buffer - I'll see how difficult it is to implement.

I've successfully implemented this and will push it out in a release sometime soon.

Fixed in v0.8.0.