New Compression Streams w3c draft

Question

New Compression Streams w3c draft

Rycochet opened this issue 4 years ago · comments

https://wicg.github.io/compression/

Compression Streams

Draft Community Group Report, 16 February 2020

The APIs specified in this specification are used to compress and decompress streams of data. They support "deflate" and "gzip" as compression algorithms. They are widely used in web developers.

Looks like we might be getting native support for some things - how can we leverage and / or make use of this (question for later as its just a draft) :-)

pieroxy · Answer 1 · Fri Aug 13 2021 14:53:44 GMT+0800 (China Standard Time)

It looks more like a replacement for lz-string if it gets widespread adoption.

Devin Weaver · Answer 2 · Tue Aug 08 2023 16:09:02 GMT+0800 (China Standard Time)

Looks like we now have wide spread support is there an advantage to lz-string over the built-in gzip + btoa now?

Ryc O'Chet · Answer 3 · Tue Aug 08 2023 19:19:36 GMT+0800 (China Standard Time)

Compatibility and choice - at some point it may be that this changes to being a wrapper for the new API's - but that would still give a known name for compatibility with other platforms :-)

Z Yin · Answer 4 · Sat Sep 23 2023 00:21:32 GMT+0800 (China Standard Time)

So has anyone tested it? (the new CompressionStream API)? It can use gzip or deflate to compress a string into ArrayBuffer, but how should I then covert the ArrayBuffer to the smallest string possible?

Erin Rivas · Answer 5 · Sun Sep 24 2023 12:31:55 GMT+0800 (China Standard Time)

Direct translation would be difficult due to the sanitization process used for things like TextDecoder, so a custom function would be needed. This seems like something that would be interesting to explore once the final v2 is ready.

Z Yin · Answer 6 · Thu Sep 28 2023 00:43:11 GMT+0800 (China Standard Time)

TextDecoder wouldn't work easily, because the first 128 code points are 1 byte and 128 to 255 are 2 bytes in UTF-8, and UTF-8
is the only choice if you also want to use TextEncoder to reverse. In my test, it can't reliably convert from bytes to string and then reverse back.

example code:

function convertToStringThenBack(input: Uint8Array) {
  const string = new TextDecoder().decode(input.buffer);
  const back = new TextEncoder().encode(string);
  const isEqual =
    back.length === input.length &&
    back.every((value, index) => value === input[index]);
  if (isEqual) console.log("good");
  else console.log("bad");
}

convertToStringThenBack(new Uint8Array([1, 10, 100, 127])); // good
convertToStringThenBack(new Uint8Array([1, 10, 100, 128])); // bad

Erin Rivas · Answer 7 · Thu Sep 28 2023 01:06:40 GMT+0800 (China Standard Time)

yeah I have run into the same. I will dig around for my function I use to maintain reliable transition

Z Yin · Answer 8 · Thu Sep 28 2023 07:57:08 GMT+0800 (China Standard Time)

One thing I've found is that base64 is actually quite good if you store text in the filesystem, because both browser and node will output UTF-8 when you ask it to write a text file to the file system, and since ASCII chars in UTF-8 is only 1 byte per char, it's quite efficient. (75% efficient to be precise, 1MB base64 string can store 0.75MB binary info).

Performant solutions already exist for conversion between array buffer and base64.

I have implemented string <-> gzip <-> base64 in my project and it's very fast.

However, if the string is to be stored in localStorage or a database, then base64 may not be efficient, depending on whether it's stored as UTF-8 or UTF-16, which I don't know.

Erin Rivas · Answer 9 · Thu Sep 28 2023 11:40:51 GMT+0800 (China Standard Time)

It is important to note that this library was originally for browser uses and officially supports those. Node is a whole different beast, and from my research does not support utf-16 strings in any way. I believe there is a nodejs port for v1.4 out there already, and would prefer to keep them as separate implementations due to excessive complexity in trying to support both.