ebiggers / libdeflate

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Windows Explorer cannot unpack some files generated with libdeflate (reward offered)

chrisgch opened this issue · comments

First, let me explain what I'm doing: I'm using libdeflate to compress files on Windows, wrap them with my own ZIP headers and store them as ZIP files. Libdeflate is amazingly fast compared with other deflate engines.

Recently a user sent me a ZIP archive created with libdeflate which the Windows 7/10/11 Explorer cannot unpack. It fails with error 0x80004005 (unknown error). The file is named FreeImage.dll version 3.17 from a sourceforge project. Strangely, all other unpackers I tried can unpack the file just fine, including the original pkzip 2.04g, 7zip, Winzip, and WinRAR.

Therefore the libdeflate compression isn't faulty, but incompatible with Windows 7/10/11 Explorer.

I have managed to extract the smallest possible block with which the error occurs (10kBytes). Interestingly, the compressed data only uses literals, no offsets, because apparently no duplicate data is found. Also the file is the same for compression method 1 to 9. The problem must therefore be located in the Huffman encoder.

Attached are 4 archives:
Explorer_Error_libdeflate.zip: This file was created with libdeflate and cannot be unpacked with Explorer
Explorer_OK_7zip.zip: This file was created with 7zip and can be unpacked with Explorer
Explorer_OK.zip: This file was created by my program with an different deflate library and can be unpacked with Explorer
Freeimage.zip: The entire dll packed with libdeflate which cannot be unpacked with Explorer

Btw, I checked the difference of the files in the libdeflate decoder:
libdeflate:
num_litlen_syms=257
num_offset_syms=1
num_explicit_precode_lens=10

regular deflate:
num_litlen_syms=257
num_offset_syms=2
num_explicit_precode_lens=18

You will probably say that it's Microsoft's fault and I should contact them about the bug, but it's impossible to contact anyone in charge at Microsoft about such a bug. And it's annoying for my users when they create ZIP files which then can't be used by Windows Explorer users. Also Windows 7 is out of support and will never get a fix for this.

Therefore I'm willing to offer you a $2000 reward if you can make libdeflate compatible with Windows Explorer.

Explorer_Error_libdeflate.zip
Explorer_OK_7zip.zip
Explorer_OK.zip
FreeImage.zip

It sounds like Windows Explorer is violating the DEFLATE specification (RFC 1951), which permits a DEFLATE block to contain one distance (offset) codeword:

A code length of 0 indicates that the corresponding symbol in the literal/length or distance alphabet will not occur in the block, and should not participate in the Huffman code construction algorithm given earlier. If only one distance code is used, it is encoded using one bit, not zero bits; in this case there is a single code length of one, with one unused code. One distance code of zero bits means that there are no distance codes used at all (the data is all literals).

This may be a case where practice does not entirely match the specification, so I will consider changing it... It would reduce compression ratio slightly, but it sounds like other DEFLATE implementations already take this penalty.

Even if I "fix" this, please file a bug report with Microsoft too. They could fix their violation of the DEFLATE specification.

I don't accept compensation for work on libdeflate.

Your analysis seems to be 100% on point! When I change the code

	if (unlikely(num_used_syms == 0)) {
		return;
	}

to (taken from the following block if (unlikely(num_used_syms == 1)) {

	if (unlikely(num_used_syms == 0)) {
		/*
		 * Code is empty.  sort_symbols() already set all lengths to 0,
		 * so there is nothing more to do.
		 */
		unsigned nonzero_idx = 1;
		codewords[0] = 0;
		lens[0] = 1;
		codewords[nonzero_idx] = 1;
		lens[nonzero_idx] = 1;
		return;
	}

then the Explorer is able to unpack the archive just fine! I hope that this is the change you had in mind. The 6 dlls out of 3718 from system32 which Explorer failed to unpack can now all

I have reported the issue via Windows feedback hub now. Unfortunately it cannot be viewed via Web, only via Feedback Hub app. Here is the link for anyone interested:
https://aka.ms/AAmj513

commented

Windows 7/10/11 Explorer cannot unpack (it). Strangely, all other unpackers I tried can unpack the file just fine, including the original pkzip 2.04g, 7zip, Winzip, and WinRAR.

Maybe you should ask Microsoft what they did to it?

Neither unzip, 7zip, nor zipinfo have seen any problem with this file.

Fixed by #324.

I decided it's worth changing this for several reasons:

  • The edge case of incomplete Huffman codes is inherently somewhat ambiguous, and the decompressor in very old versions of zlib mishandles it too. So it's not just Windows Explorer. Perhaps Windows Explorer incorporates a very old version of zlib and that is why the issue exists there.
  • The other DEFLATE compressors I checked avoid this edge case too.
  • Avoiding incomplete codes only makes a difference in rare cases, and it only uses a few more bits.

I do still consider it to be a bug for decompressors to not handle this edge case in the specification.

@ebiggers - is there any way to disable, for instance if we are using with png?
I believe pngout provides a parameter for this (source):

/mincodes# : ensure Huffman tables have a minimum number of codes: {0(default),1,2}. May add a few extra bytes to the file size, but provides compatibility with buggy PNG decoders; applications using Zlib 1.2.1 require 1, while some mobile phones require 2.

... or is this something else?

I don't plan to add a parameter like that. More parameters == more complexity, and users will not know when to use it. zlib, 7-Zip, zopfli do not provide such a parameter either. The difference in compression ratio is very small. Also it usually only affects short inputs, but libdeflate v1.19 usually does better on short inputs than v1.18 due to other improvements.

@ebiggers - thanks for the reply. It is good to avoid unnecessary parameters +1.
I'm suggesting that png images do not need these minimum codes, (since the 2000s), so perhaps those streams could be treated as before?

Even if no PNG decoder has this issue (I wouldn't be so sure about that), libdeflate can't do something specifically for PNG without adding a parameter that tells libdeflate whether the DEFLATE stream is being created for a PNG or not.

I have reported the issue via Windows feedback hub now. Unfortunately it cannot be viewed via Web, only via Feedback Hub app. Here is the link for anyone interested:
https://aka.ms/AAmj513

I get Your account doesn't have access to this feedback
I this feedback still publicly accessible in Microsoft Feedback-hub?