Forward compatible

Question

Forward compatible

annevk opened this issue 5 years ago · comments

If we want to be forward compatible with non-gzip algorithms it seems v1 will have to take a dictionary argument with something that defaults to gzip so that if you pass something else it'll throw.

Otherwise the default in legacy implementations will be to ignore the passed argument and simply use gzip, which is probably not desirable? Although I suppose if the type of compression is exposed in v2 there'll be feature detection possible using that as well. So maybe this is all okay. Leaving this here for your consideration.

Adam Rice · Answer 1 · Thu Aug 29 2019 14:07:43 GMT+0800 (China Standard Time)

My vision for the future of the API is new CompressStream(algorithm, { options }), since the interpretation of the options probably depends on the algorithm in use. But I'd rather not set the future shape of the API in stone at the moment.

My feeling is that it's okay not to handle any arguments yet, since WebIDL will ensure it throws if any arguments are passed. Considering a few cases:

If future person writes

new CompressStream(). This should work (using "gzip" with the default options), and it will.
new CompressStream('gzip'). It would be nice if this worked, but if future person wants to support older implementations, they can just leave off the 'gzip' argument and have it work, so probably okay if this throws.
new CompressStream('deflate'). Should throw as 'deflate' is not supported. Will throw since a non-empty argument list is not supported.
new CompressStream('gzip', { level: 'high' }). Should throw as the level option is not supported. Will throw since a non-empty argument list is not supported.

Anne van Kesteren · Answer 2 · Thu Aug 29 2019 15:43:27 GMT+0800 (China Standard Time)

WebIDL will ensure it throws if any arguments are passed

That's not true, they'll be ignored.

Adam Rice · Answer 3 · Thu Aug 29 2019 17:00:43 GMT+0800 (China Standard Time)

That's not true, they'll be ignored.

Sorry, I should have checked. That ruins my forward-compatibility story.

Is there anything we can do that doesn't force us to make final decisions about the API shape now?

Anne van Kesteren · Answer 4 · Thu Aug 29 2019 17:15:22 GMT+0800 (China Standard Time)

As I mentioned, if a future version would also expose algorithm as a prototype property it's probably fine, since anyone wanting to do feature testing could branch on that.

Adam Rice · Answer 5 · Thu Aug 29 2019 17:47:27 GMT+0800 (China Standard Time)

@annevk Thanks! Sorry I didn't properly understand your point the first time.

Timothy Gu · Answer 6 · Fri Aug 30 2019 12:42:02 GMT+0800 (China Standard Time)

I'm personally not very satisfied with new CompressStream() defaulting to gzip. From an API design standpoint, it seems difficult to justify "why gzip" versus (for example) zlib. Frankly, to me it feels more reasonable to mandate that the first argument should always be passed, and for a v1 to only support "gzip" as the argument. I'd love to hear about the rationale of the current choice.

(I understand there is precedence for defaulting to a format, like HTMLCanvasElement.toBlob() defaulting to PNG. But I'm not necessarily convinced that is a "good" precedence.)

Adam Rice · Answer 7 · Fri Aug 30 2019 14:09:17 GMT+0800 (China Standard Time)

@TimothyGu My impression was that "gzip" was more widely used than "deflate", making it a natural choice for the default. "gzip" also has the benefit of having a ubiquitous command-line tool for adhoc decompression when web developers are checking their code works properly.

However, as you say, maybe there is no natural default.

Arguments in favour of defaulting to gzip:

Sane defaults make writing correct code easier.
Avoids having to bake-in the API shape in the first version.
Makes the first implementation ever so slightly simpler.
Familiar from HTTP gzip compression.

Arguments in favour of defaulting to deflate:

Most of arguments for having a default apply to deflate too.
The output is slightly smaller.
Can be used to implement the zip file format.

Arguments in favour of having no default:

It doesn't really make sense to say "I don't care what format you compress this in", as you need to know the format to decompress it afterwards.

hugo306 · Answer 8 · Sun Sep 01 2019 04:58:29 GMT+0800 (China Standard Time)

As it stands, this API is about the least suitable for replacing actual compression code in the wild as it's physically possible to be.

The most popular js libraries are all structured the same way. "raw deflate" rfc1951 with zlib, gzip, and optionally streaming and zip layered on top.
Of which rawdeflate (followed by rawinflate) unavoidably take up the vast majority of the final code size.

gzip is not the lowest level primitive.

For Compress this will require rather clumsily waiting until after the final chunk, then stripping off the gzip framing. stream.pipeThough(new CompressStream).pipeThough(new StripGzipFraming);

For Decompress it's deadly, because without the uncompressed size and crc32 of the uncompressed data, there's no way of tricking the API into decompressing it. (Presumably the stream will throw and error).
PNG uses adler-32 not crc32, so low-level manipulation of PNG is right out.

Libraries can't just ignore the compression level passed by the API consumer.

Compression level is minimum viable product, not future work.
Levels of 1, 9, or zopfli 11 have wildly different use cases.

gzip files are customizable beyond just compression level. File name, original file name, time stamp, and comment.

As a fixed header structure this meta-data is trivial for libraries to prefix to the stream (pako.js already does).
Not trivial to bake into a web API, maybe ever.

The gzip header/footer while tiny, make it less attractive for per-message Websocket compression, and similar.

I don't think this is the powerful transform primitive end users or library maintainers were waiting for. It might be solving the 20% of use cases, not the 80%.

Adam Rice · Answer 9 · Tue Sep 03 2019 16:22:59 GMT+0800 (China Standard Time)

@hugo306 Thank you for your detailed feedback.

gzip is not the lowest level primitive.

Your argument is compelling. I have discussed it with @CanonMukai and our tentative plan is to support both "deflate" and "gzip" in the first version. We will update the explainer in due course.

Libraries can't just ignore the compression level passed by the API consumer.

I don't want to restrict implementations to only using zlib. The compression levels 1 to 9 are not standardised anywhere AFAIK but are just part of the zlib library. So we need to come up with some abstraction which works for different implementations. Figuring out what that abstraction should be will take time. Probably we will need multiple implementations before we know what the correct abstraction should be. I don't think this should block the availability of CompressStream altogether.

gzip files are customizable beyond just compression level. File name, original file name, time stamp, and comment.

I think we can support these as options in a future version of the API. I haven't examined these fields in a gzip file myself in over a decade, so anecdotally I think they are rarely used and shouldn't block availability of the API.

The gzip header/footer while tiny, make it less attractive for per-message Websocket compression, and similar.

As far as I know, all WebSocket-supporting browsers support permessage-deflate. It's even in the Fetch standard: https://fetch.spec.whatwg.org/#example-permessage-deflate. However, there's plenty of other user-cases where the low overhead of deflate would be beneficial.

Kenji Baheux · Answer 10 · Wed Sep 04 2019 08:30:33 GMT+0800 (China Standard Time)

While gzip has been around for a while, a design where that expectation isn't baked in would be more future-proof, extensible and set good development hygiene .

In other words, can we design this API in a way that would guide developers toward "feature detection => use" instead of "Meh, gzip is here to stay => use" ?

I'm not sure how prescriptive we can/want to be, i.e. a design where not checking for supported algos before creating a compression stream would throw 😅 ... Are there best practice examples on this topic?

Adam Rice · Answer 11 · Wed Sep 04 2019 11:06:36 GMT+0800 (China Standard Time)

The deflate algorithm is used in PNG, WebSockets, HTTP content encoding, PDF, SVGZ, and more. I can't imagine it ever not being built into the browser. I think it will always be fine to use gzip or deflate as a fallback.

new CompressStream('non-existent-codec') will throw an exception; is that sufficient?

Kenji Baheux · Answer 12 · Wed Sep 04 2019 11:51:43 GMT+0800 (China Standard Time)

OK, looks like there are plenty of bakers in the default bakery :)
So, we can put the issue to rest for the gzip default.

The situation I'm worried about is something like doing new CompressStream('brotli'); without proper detection / failure handling because "brotli is widely supported in 'all' browsers and naturally exposed to this API". Unlike gzip, it seems possible* that brotli could be superseded with something else. But perhaps, this is the best we can do?

*: only used for HTTP content encoding and WebFonts at the moment.

Adam Rice · Answer 13 · Wed Sep 04 2019 11:56:48 GMT+0800 (China Standard Time)

Here's a sketch of how brotli might work:

import brotli from "std::brotli";
CompressStream.registerCodec(brotli);
const cs = new CompressStream('brotli');

Kenji Baheux · Answer 14 · Wed Sep 04 2019 12:07:32 GMT+0800 (China Standard Time)

Looks great.

Adam Rice · Answer 15 · Mon Nov 25 2019 19:45:40 GMT+0800 (China Standard Time)

I'm going to consider this fixed now that we always require a format to be specified.