pieroxy / lz-string

LZ-based compression algorithm for JavaScript

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

New Compression Streams w3c draft

Rycochet opened this issue · comments

https://wicg.github.io/compression/

Compression Streams

Draft Community Group Report, 16 February 2020

The APIs specified in this specification are used to compress and decompress streams of data. They support "deflate" and "gzip" as compression algorithms. They are widely used in web developers.

Looks like we might be getting native support for some things - how can we leverage and / or make use of this (question for later as its just a draft) :-)

It looks more like a replacement for lz-string if it gets widespread adoption.

Looks like we now have wide spread support is there an advantage to lz-string over the built-in gzip + btoa now?

Compatibility and choice - at some point it may be that this changes to being a wrapper for the new API's - but that would still give a known name for compatibility with other platforms :-)

So has anyone tested it? (the new CompressionStream API)? It can use gzip or deflate to compress a string into ArrayBuffer, but how should I then covert the ArrayBuffer to the smallest string possible?

Direct translation would be difficult due to the sanitization process used for things like TextDecoder, so a custom function would be needed. This seems like something that would be interesting to explore once the final v2 is ready.

TextDecoder wouldn't work easily, because the first 128 code points are 1 byte and 128 to 255 are 2 bytes in UTF-8, and UTF-8
is the only choice if you also want to use TextEncoder to reverse. In my test, it can't reliably convert from bytes to string and then reverse back.

example code:

function convertToStringThenBack(input: Uint8Array) {
  const string = new TextDecoder().decode(input.buffer);
  const back = new TextEncoder().encode(string);
  const isEqual =
    back.length === input.length &&
    back.every((value, index) => value === input[index]);
  if (isEqual) console.log("good");
  else console.log("bad");
}

convertToStringThenBack(new Uint8Array([1, 10, 100, 127])); // good
convertToStringThenBack(new Uint8Array([1, 10, 100, 128])); // bad

yeah I have run into the same. I will dig around for my function I use to maintain reliable transition

One thing I've found is that base64 is actually quite good if you store text in the filesystem, because both browser and node will output UTF-8 when you ask it to write a text file to the file system, and since ASCII chars in UTF-8 is only 1 byte per char, it's quite efficient. (75% efficient to be precise, 1MB base64 string can store 0.75MB binary info).

Performant solutions already exist for conversion between array buffer and base64.

I have implemented string <-> gzip <-> base64 in my project and it's very fast.

However, if the string is to be stored in localStorage or a database, then base64 may not be efficient, depending on whether it's stored as UTF-8 or UTF-16, which I don't know.

It is important to note that this library was originally for browser uses and officially supports those. Node is a whole different beast, and from my research does not support utf-16 strings in any way. I believe there is a nodejs port for v1.4 out there already, and would prefer to keep them as separate implementations due to excessive complexity in trying to support both.