zopfli-rs / zopfli

A Rust implementation of the Zopfli compression algorithm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Early stopping

Pr0methean opened this issue · comments

I suspect that without too much difficulty, theorems along the lines of the following could be proven mathematically for the algorithm and debug_asserted for the implementation:

  • When an iteration of Zopfli hasn't reduced the file size, subsequent iterations won't do so either.
  • If a file's uncompressed size is N bytes, the minimum compressed size will be found within cN + d iterations for some small constants c and d (probably c < 10 and d < 10).

It would be helpful to have these applied to limit the number of iterations for small blocks, which would help with fuzz testing (where a very large iteration count and a very small file can be properties of a corner case that needs to be tested, even if having them happen in production would indicate a wrong assumption), especially given cargo fuzz's bias toward very small Vec<u8>s.

Your hypotheses sound sensible and useful to me, although I'm not sure right now on how to go about formally proving them. I'd need to understand the applied mathematics behind the algorithm more than I currently do to make a definitive statement.

Out of curiosity, do you happen to know of any good resources to learn how exactly Zopfli works? The Zopfli whitepaper can be summarized as "we made this compressor, tested it, and it turned out to work well in practice", which is not very helpful. On the other hand, the books and papers I've found on LZ77 and compression algorithms in general tend to be somewhat old and disconnected from the considerations and refinements made by state-of-the-art implementations like Zopfli.

While I don't know anything myself, I have noticed libdeflate's code is super well documented - could be helpful to read over it. I think it borrows a lot of concepts from zstd but the near-optimal algorithm likely has some similarities with zopfli.