ckolivas / lrzip

Long Range Zip

Home Page:http://lrzip.kolivas.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

lrzip 0.651 test issue

chenrui333 opened this issue · comments

Trying to upgrade lrzip to the latest release (0.651), but ran into some regression test issue.

$ lrzip -d data.txt.lrz
Output filename is: data.txt
Decompressing...
Failed to pthread_mutex_lock
No such file or directory
Fatal error - exiting

Here is more information

$ echo "1 2 3" > data.txt
$ lrzip -vv -o data.txt.lrz data.txt
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 10
Detected 34359738368 bytes ram
Compression level 7
Nice Value: 19
Show Progress
Max Verbose
Output Filename Specified: data.txt.lrz
Temporary Directory set as: /var/folders/4w/t9h6qm850g395cdb8js574nc0000gn/T/
Compression mode is: LZMA. LZ4 Compressibility testing enabled
Heuristically Computed Compression Window: 218 = 21800MB
Storage time in seconds 1388011776
File size: 6
Succeeded in testing 6 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 6
Byte width: 1
Warning, low memory for chosen compression settings
Succeeded in testing 2191720448 sized malloc for back end compression
Using up to 11 threads to compress up to 16384 bytes each.
Beginning rzip pre-processing phase
hashsize = 4194304.  bits = 22. 64MB
0 total hashes
Malloced 11453235200 for checksum ckbuf
Starting thread 0 to compress 10 bytes from stream 0
Starting thread 1 to compress 6 bytes from stream 1
Writing initial chunk bytes value 1 at 24
Writing EOF flag as 1
Writing initial header at 27
Compthread 0 seeking to 3 to store length 1
Compthread 0 seeking to 8 to write header
Thread 0 writing 10 compressed bytes from stream 0
Compthread 0 writing data at 12
Compthread 1 seeking to 7 to store length 1
Compthread 1 seeking to 22 to write header
Thread 1 writing 6 compressed bytes from stream 1
Compthread 1 writing data at 26
MD5: f2b33fb7b3d0eb95090a16060e6a24f9
matches=0 match_bytes=0
literals=2 literal_bytes=6
true_tag_positives=0 false_tag_positives=0
inserts=0 match 0.167
data.txt - Compression Ratio: 0.080. Average Compression Speed:  0.000MB/s.
Total time: 00:00:00.01

$ lrzip -vv -d data.txt.lrz
The following options are in effect for this DECOMPRESSION.
Threading is ENABLED. Number of CPUs detected: 10
Detected 34359738368 bytes ram
Compression level 7
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: /var/folders/4w/t9h6qm850g395cdb8js574nc0000gn/T/
Output filename is: data.txt
Detected lrzip version 0.6 file.
MD5 being used for integrity testing.
Decompressing...
Reading chunk_bytes at 24
Expected size: 6
Chunk byte width: 1
Reading eof flag at 25
EOF: 1
Reading expected chunksize at 26
Chunk size: 0
Reading stream 0 header at 28
Reading stream 1 header at 32
Reading ucomp header at 36
Fill_buffer stream 0 c_len 10 u_len 10 last_head 0
Starting thread 0 to decompress 10 bytes from stream 0
Thread 0 decompressed 10 bytes from stream 0
Taking decompressed data from thread 0
Reading ucomp header at 50
Fill_buffer stream 1 c_len 6 u_len 6 last_head 0
Starting thread 1 to decompress 6 bytes from stream 1
Thread 1 decompressed 6 bytes from stream 1
Taking decompressed data from thread 1
Closing stream at 59, want to seek to 59
Failed to pthread_mutex_lock
No such file or directory
Deleting broken file data.txt
Fatal error - exiting

WTF? Why on earth are you testing a 6 byte file? What do you think would happen? Please don't waste time here. Granted, lrzip should abandon any foolish attempt like this.

There are three bugs here.

  1. This is a MAC related issue. Works fine on x86_64
  2. When chunk size is < 4096 bytes, chunk size will show as 4096 as a minimum.
  3. When chunk bytes is 1, i.e. input is < 256 bytes, chunk size shows as 0.
    This is because 4,096 is stored in lrzip file as 00 01. When chunk bytes is 1, the 01 is truncated out and only 00 remains.

When compressing, lrzip seems to recognize the chunk size. The problem is that what it computes, and what it stores does not match with smaller files.

See these examples:

File size: 4096
Succeeded in testing 4096 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 4096
Byte width: 2

00000000 4c 52 5a 49 00 06 00 10 00 00 00 00 00 00 00 00 |LRZI............|
00000010 5d 00 00 00 01 01 00 00 02 01 00 10 03 00 00 00 |]...............|
02 = byte width
01 = EOF marker
00 10 = 0x1000 = 4096

File size: 4095
Succeeded in testing 4095 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 4095
Byte width: 2

00000000 4c 52 5a 49 00 06 ff 0f 00 00 00 00 00 00 00 00 |LRZI............|
00000010 5d 00 00 00 01 01 00 00 02 01 00 10 03 00 00 00 |]...............|
02 = byte width
01 = EOF marker
00 10 = 0x1000 = 4096

File size: 511
Succeeded in testing 511 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 511
Byte width: 2

00000000 4c 52 5a 49 00 06 ff 01 00 00 00 00 00 00 00 00 |LRZI............|
00000010 5d 00 00 00 01 01 00 00 02 01 00 10 03 00 00 00 |]...............|
02 = byte width
01 = EOF marker
00 10 = 0x1000 = 4096

File size: 255
Succeeded in testing 255 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 255
Byte width: 1

00000000 4c 52 5a 49 00 06 ff 00 00 00 00 00 00 00 00 00 |LRZI............|
00000010 5d 00 00 00 01 01 00 00 01 01 00 03 00 00 08 03 |]...............|
01 = byte width
01 = EOF marker
00 = zero chunk size! Only one byte written to header

Interestingly, this does not seem to impact decompression

Decompressing...
Reading chunk_bytes at 24
Expected size: 255
Chunk byte width: 1
Reading eof flag at 25
EOF: 1
Reading expected chunksize at 26
Chunk size: 0
...
Closing stream at 60, want to seek to 60

Average DeCompression Speed:  0.000MB/s
MD5: 6df9012b2b7cb3c55963499a26309bba
Output filename is: data.txt: [OK] - 255 bytes                                
Total time: 00:00:00.07

I reviewed the code and the problem seems to occur in the compthread() function in stream.c for the initial thread. It's not clear right now where the 4,096 is coded in