ricea / compressstream-explainer

Compression Streams Explained

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Forward compatible

annevk opened this issue · comments

If we want to be forward compatible with non-gzip algorithms it seems v1 will have to take a dictionary argument with something that defaults to gzip so that if you pass something else it'll throw.

Otherwise the default in legacy implementations will be to ignore the passed argument and simply use gzip, which is probably not desirable? Although I suppose if the type of compression is exposed in v2 there'll be feature detection possible using that as well. So maybe this is all okay. Leaving this here for your consideration.

My vision for the future of the API is new CompressStream(algorithm, { options }), since the interpretation of the options probably depends on the algorithm in use. But I'd rather not set the future shape of the API in stone at the moment.

My feeling is that it's okay not to handle any arguments yet, since WebIDL will ensure it throws if any arguments are passed. Considering a few cases:

If future person writes

  • new CompressStream(). This should work (using "gzip" with the default options), and it will.
  • new CompressStream('gzip'). It would be nice if this worked, but if future person wants to support older implementations, they can just leave off the 'gzip' argument and have it work, so probably okay if this throws.
  • new CompressStream('deflate'). Should throw as 'deflate' is not supported. Will throw since a non-empty argument list is not supported.
  • new CompressStream('gzip', { level: 'high' }). Should throw as the level option is not supported. Will throw since a non-empty argument list is not supported.

WebIDL will ensure it throws if any arguments are passed

That's not true, they'll be ignored.

That's not true, they'll be ignored.

Sorry, I should have checked. That ruins my forward-compatibility story.

Is there anything we can do that doesn't force us to make final decisions about the API shape now?

As I mentioned, if a future version would also expose algorithm as a prototype property it's probably fine, since anyone wanting to do feature testing could branch on that.

@annevk Thanks! Sorry I didn't properly understand your point the first time.

I'm personally not very satisfied with new CompressStream() defaulting to gzip. From an API design standpoint, it seems difficult to justify "why gzip" versus (for example) zlib. Frankly, to me it feels more reasonable to mandate that the first argument should always be passed, and for a v1 to only support "gzip" as the argument. I'd love to hear about the rationale of the current choice.

(I understand there is precedence for defaulting to a format, like HTMLCanvasElement.toBlob() defaulting to PNG. But I'm not necessarily convinced that is a "good" precedence.)

@TimothyGu My impression was that "gzip" was more widely used than "deflate", making it a natural choice for the default. "gzip" also has the benefit of having a ubiquitous command-line tool for adhoc decompression when web developers are checking their code works properly.

However, as you say, maybe there is no natural default.

Arguments in favour of defaulting to gzip:

  • Sane defaults make writing correct code easier.
  • Avoids having to bake-in the API shape in the first version.
  • Makes the first implementation ever so slightly simpler.
  • Familiar from HTTP gzip compression.

Arguments in favour of defaulting to deflate:

  • Most of arguments for having a default apply to deflate too.
  • The output is slightly smaller.
  • Can be used to implement the zip file format.

Arguments in favour of having no default:

  • It doesn't really make sense to say "I don't care what format you compress this in", as you need to know the format to decompress it afterwards.

As it stands, this API is about the least suitable for replacing actual compression code in the wild as it's physically possible to be.

The most popular js libraries are all structured the same way. "raw deflate" rfc1951 with zlib, gzip, and optionally streaming and zip layered on top.
Of which rawdeflate (followed by rawinflate) unavoidably take up the vast majority of the final code size.

  • gzip is not the lowest level primitive.

For Compress this will require rather clumsily waiting until after the final chunk, then stripping off the gzip framing. stream.pipeThough(new CompressStream).pipeThough(new StripGzipFraming);

For Decompress it's deadly, because without the uncompressed size and crc32 of the uncompressed data, there's no way of tricking the API into decompressing it. (Presumably the stream will throw and error).
PNG uses adler-32 not crc32, so low-level manipulation of PNG is right out.

  • Libraries can't just ignore the compression level passed by the API consumer.

Compression level is minimum viable product, not future work.
Levels of 1, 9, or zopfli 11 have wildly different use cases.

  • gzip files are customizable beyond just compression level. File name, original file name, time stamp, and comment.

As a fixed header structure this meta-data is trivial for libraries to prefix to the stream (pako.js already does).
Not trivial to bake into a web API, maybe ever.

  • The gzip header/footer while tiny, make it less attractive for per-message Websocket compression, and similar.

I don't think this is the powerful transform primitive end users or library maintainers were waiting for. It might be solving the 20% of use cases, not the 80%.

@hugo306 Thank you for your detailed feedback.

gzip is not the lowest level primitive.

Your argument is compelling. I have discussed it with @CanonMukai and our tentative plan is to support both "deflate" and "gzip" in the first version. We will update the explainer in due course.

Libraries can't just ignore the compression level passed by the API consumer.

I don't want to restrict implementations to only using zlib. The compression levels 1 to 9 are not standardised anywhere AFAIK but are just part of the zlib library. So we need to come up with some abstraction which works for different implementations. Figuring out what that abstraction should be will take time. Probably we will need multiple implementations before we know what the correct abstraction should be. I don't think this should block the availability of CompressStream altogether.

gzip files are customizable beyond just compression level. File name, original file name, time stamp, and comment.

I think we can support these as options in a future version of the API. I haven't examined these fields in a gzip file myself in over a decade, so anecdotally I think they are rarely used and shouldn't block availability of the API.

The gzip header/footer while tiny, make it less attractive for per-message Websocket compression, and similar.

As far as I know, all WebSocket-supporting browsers support permessage-deflate. It's even in the Fetch standard: https://fetch.spec.whatwg.org/#example-permessage-deflate. However, there's plenty of other user-cases where the low overhead of deflate would be beneficial.

While gzip has been around for a while, a design where that expectation isn't baked in would be more future-proof, extensible and set good development hygiene .

In other words, can we design this API in a way that would guide developers toward "feature detection => use" instead of "Meh, gzip is here to stay => use" ?

I'm not sure how prescriptive we can/want to be, i.e. a design where not checking for supported algos before creating a compression stream would throw 😅 ... Are there best practice examples on this topic?

The deflate algorithm is used in PNG, WebSockets, HTTP content encoding, PDF, SVGZ, and more. I can't imagine it ever not being built into the browser. I think it will always be fine to use gzip or deflate as a fallback.

new CompressStream('non-existent-codec') will throw an exception; is that sufficient?

OK, looks like there are plenty of bakers in the default bakery :)
So, we can put the issue to rest for the gzip default.

The situation I'm worried about is something like doing new CompressStream('brotli'); without proper detection / failure handling because "brotli is widely supported in 'all' browsers and naturally exposed to this API". Unlike gzip, it seems possible* that brotli could be superseded with something else. But perhaps, this is the best we can do?

*: only used for HTTP content encoding and WebFonts at the moment.

Here's a sketch of how brotli might work:

import brotli from "std::brotli";
CompressStream.registerCodec(brotli);
const cs = new CompressStream('brotli');

Looks great.

I'm going to consider this fixed now that we always require a format to be specified.