Zip::InputStream reads only first file, no errors raised
al opened this issue · comments
rubyzip v2.3.0
I'm encountering what appears to be a duplicate of #227, i.e. only the first file in an archive being extracted.
That issue, long since closed, suggests that an error should be raised (presumably from
rubyzip/lib/zip/input_stream.rb
Line 138 in 750c474
Code is simply:
require 'zip'
zip_path = "path/to/zip"
Zip::InputStream.open(zip_path) do |io|
while (entry = io.get_next_entry)
puts entry.name
end
end
Ideally I'd expect the names of all files in the archive to be displayed, or at least an error to be raised, instead only one name is printed and no error is indicated.
The problem occurs when the Zip archive is created by the OSX Archive Utility. Archives created with the zip
command line tool are handled as expected, i.e. all names are printed.
Note Zip::InputStream is being used to deal with the potential for non-unique file names, as suggested in #342.
Any thoughts?
Hello, thanks for letting us know about this. I think OSX Archive Utility causes us quite a few issues - or we cause it quite a few issues.
Would it be at all possible that you could supply us with two zip files - one created with OSX Archive Utility and one with the command line tool - with the same files in? I don't have a Mac but I'm keen to debug this.
Hi @hainesr. Sure:
Archive.zip created in OSX by selecting both files, right clicking, and choosing "Compress 2 items" from the context menu.
commanline.zip created by issuing the following command in a console:
zip commandline.zip 1.txt 2.txt
FYI
zip --help
Copyright (c) 1990-2008 Info-ZIP - Type 'zip "-L"' for software license.
Zip 3.0 (July 5th 2008). Usage:
...
And trying to stream the contents:
>> puts Gem.loaded_specs['rubyzip'].version
2.3.0
=> nil
>> puts RUBY_VERSION
2.7.1
=> nil
>>
>> require 'zip'
=> true
>>
>> zip_path = "#{dir}/Archive.zip"
>>
?> Zip::InputStream.open(zip_path) do |io|
?> while (entry = io.get_next_entry)
?> puts entry.name
?> end
>> end
1.txt
=> nil
>>
>> zip_path = "#{dir}/commandline.zip"
>>
?> Zip::InputStream.open(zip_path) do |io|
?> while (entry = io.get_next_entry)
?> puts entry.name
?> end
>> end
1.txt
2.txt
=> nil
Many thanks for this @al.
I think you are seeing this behaviour due to the fact that we don't handle data descriptors properly yet (#460, #269, #295) and OSX Archive Utility builds Zip files like no other tool - there's no need for it to use data descriptors, but it does. It also kind of uses them wrong.
Anyway, this needs fixing, and I'll get on it ASAP.
In the meantime, if you can possibly use Zip::File.open
then it should be able to extract these files as expected.
Great, thanks @hainesr
I've looked at this a bit deeper now. This appears to be a really specific bug thanks to the frankly weird and non-standard (I was going to say 'wrong') way OSX Archive Utility builds Zip files. Think Different indeed.
Archive looks like it's effectively streaming to disk as it's deflating files, and so doesn't know the compressed size of a file when it's writing the local header - which comes before the actual data. So it uses a data descriptor to store that info after the data. All fine and standard. But Archive does know the uncompressed data size when it's writing the local header and it does write that into the local header. Which is not standard, and why we're seeing this specific bug.
Rubyzip checks the local header for the streaming flag (gp flags bit 3) and that the compressed size, uncompressed size and CRC check are all zero before deciding it can't extract an entry using Zip::InputStream
- but the uncompressed size of an entry produced by Archive is not zero, so it tries and fails. Without knowing the size of the compressed data up front we can't extract, so in this case we should use Zip::File.open
instead, which reads the central directory for this information.
All this is to say that the obvious fix for this issue is to raise the error about not being able to extract with Zip::InputStream
if just the compressed data size is missing from the local header. So I'll look at that in the short term. Longer term it might be possible to do some searching forward in the archive to piece together the info we need to extract these sorts of things, but we'd need to be careful.
@al, may I use your Archive.zip as a fixture for testing?
@al, may I use your Archive.zip as a fixture for testing?
Sure, the poem that those verses are taken from is out of copyright I believe. And thanks again for looking into this so promptly.
For anyone else running into this:
On Linux I was able to work around the issue with rubyzip 2.4.rc1 by removing the macOS meta data files from the archive before processing with Zip::InputStream
$ zip -d filename.zip \*/.DS_Store
$ zip -d filename.zip __MACOSX/\*