ckolivas / lrzip

Long Range Zip

Home Page:http://lrzip.kolivas.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"Warning, low memory for chosen compression settings" for small target files

bleedchf opened this issue · comments

I am using lrzip to compress database dumps in an automated way. Some of them are a few KBs, some of them are multiple GBs. lrzip recently is giving me a weird notice of "Warning, low memory for chosen compression settings" but only if the target file is small. This happens even though i am using the -U flag for compression.

[root@test lrziptest]# fallocate -l 100m ./100m
[root@test lrziptest]# fallocate -l 100k ./100k
[root@test lrziptest]# ls -lh
total 101M
-rw-r--r-- 1 root root 100K Oct 26 18:11 100k
-rw-r--r-- 1 root root 100M Oct 26 18:10 100m
[root@test lrziptest]# lrzip -L 9 -U -q -z ./100m
Output filename is: ./100m.lrz
./100m - Compression Ratio: 21144.908. Average Compression Speed: 4.762MB/s.
Total time: 00:00:20.75
[root@test lrziptest]# lrzip -L 9 -U -q -z ./100k
Output filename is: ./100k.lrz
Warning, low memory for chosen compression settings
./100k - Compression Ratio: 311.246. Average Compression Speed: 0.000MB/s.
Total time: 00:00:00.17

I do not think this behavior is intended or logical. The machine has 64GBs of RAM, of which more than 48GBs are free. Also it happens ONLY if the target file is small ... which makes even less sense. Any ideas?

Do not use -q. Use -vv so heuristic output can be seen. I'm also confused by the compression ratio of 21144.908. That is pretty extreme! This could be the result of a data type thing, 32 bit vs 64. Does this happen with lrzip-next?

Here we go:

[root@test lrziptest]# fallocate -l 100m ./100m
[root@test lrziptest]# fallocate -l 100k ./100k
[root@test lrziptest]# lrzip -L 9 -U -vv -z ./100m
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 12
Detected 67353292800 bytes ram
Compression level 9
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: ./
Compression mode is: ZPAQ. LZ4 Compressibility testing enabled
Using Unlimited Window size
Storage time in seconds 1387435965
Output filename is: ./100m.lrz
File size: 104857600
Succeeded in testing 104857600 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 104857600
Byte width: 4
Succeeded in testing 1631584256 sized malloc for back end compression
Using up to 13 threads to compress up to 10485760 bytes each.
Beginning rzip pre-processing phase
hashsize = 4194304. bits = 22. 64MB
0 total hashesunk: 99%
Starting thread 0 to compress 10485760 bytes from stream 1
Starting thread 1 to compress 10485760 bytes from stream 1
Starting thread 2 to compress 10485760 bytes from stream 1
Starting thread 3 to compress 10485760 bytes from stream 1
Starting thread 4 to compress 10485760 bytes from stream 1
Starting thread 5 to compress 10485760 bytes from stream 1
Starting thread 6 to compress 10485760 bytes from stream 1
Starting thread 7 to compress 10485760 bytes from stream 1
Starting thread 8 to compress 10485760 bytes from stream 1
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
Starting thread 9 to compress 10485760 bytes from stream 1
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
Malloced 22451097600 for checksum ckbuf
lz4 testing OK for chunk 10485760. Compressed size = 0.42% of chunk, 1 Passes
Starting thread 10 to compress 4810 bytes from stream 0 4:0% 6:0% 7:0% 9:0% 10:0% 5:0%
Starting thread 11 to compress 0 bytes from stream 1
lz4 testing OK for chunk 4810. Compressed size = 0.73% of chunk, 1 Passes
Writing initial chunk bytes value 4 at 24:100% 3:90% 4:100% 5:100% 6:100% 7:100% 8:90% 9:100% 10:100% 1:100%
Writing EOF flag as 1
Writing initial header at 30
Compthread 0 seeking to 22 to store length 4
Compthread 0 seeking to 26 to write header
Thread 0 writing 450 compressed bytes from stream 1
Compthread 0 writing data at 39
Compthread 1 seeking to 35 to store length 4
Compthread 1 seeking to 489 to write header
Thread 1 writing 450 compressed bytes from stream 1
Compthread 1 writing data at 502
Compthread 2 seeking to 498 to store length 4 3:100% 8:100%
Compthread 2 seeking to 952 to write header
Thread 2 writing 450 compressed bytes from stream 1
Compthread 2 writing data at 965
Compthread 3 seeking to 961 to store length 4
Compthread 3 seeking to 1415 to write header
Thread 3 writing 450 compressed bytes from stream 1
Compthread 3 writing data at 1428
Compthread 4 seeking to 1424 to store length 4
Compthread 4 seeking to 1878 to write header
Thread 4 writing 450 compressed bytes from stream 1
Compthread 4 writing data at 1891
Compthread 5 seeking to 1887 to store length 4
Compthread 5 seeking to 2341 to write header
Thread 5 writing 450 compressed bytes from stream 1
Compthread 5 writing data at 2354
Compthread 6 seeking to 2350 to store length 4
Compthread 6 seeking to 2804 to write header
Thread 6 writing 450 compressed bytes from stream 1
Compthread 6 writing data at 2817
Compthread 7 seeking to 2813 to store length 4
Compthread 7 seeking to 3267 to write header
Thread 7 writing 450 compressed bytes from stream 1
Compthread 7 writing data at 3280
Compthread 8 seeking to 3276 to store length 4
Compthread 8 seeking to 3730 to write header
Thread 8 writing 450 compressed bytes from stream 1
Compthread 8 writing data at 3743
Compthread 9 seeking to 3739 to store length 4
Compthread 9 seeking to 4193 to write header
Thread 9 writing 450 compressed bytes from stream 1
Compthread 9 writing data at 4206
Compthread 10 seeking to 9 to store length 4
Compthread 10 seeking to 4656 to write header
Thread 10 writing 231 compressed bytes from stream 0
Compthread 10 writing data at 4669
Compthread 11 seeking to 4202 to store length 4
Compthread 11 seeking to 4900 to write header
Thread 11 writing 0 compressed bytes from stream 1
Compthread 11 writing data at 4913
MD5: 2f282b84e7e608d5852449ed940bfc51
matches=0 match_bytes=0
literals=1602 literal_bytes=104857600
true_tag_positives=0 false_tag_positives=0
inserts=0 match 0.000
./100m - Compression Ratio: 21144.908. Average Compression Speed: 4.762MB/s.
Total time: 00:00:20.71
[root@test lrziptest]# lrzip -L 9 -U -vv -z ./100k
The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 12
Detected 67353292800 bytes ram
Compression level 9
Nice Value: 19
Show Progress
Max Verbose
Temporary Directory set as: ./
Compression mode is: ZPAQ. LZ4 Compressibility testing enabled
Using Unlimited Window size
Storage time in seconds 1387435973
Output filename is: ./100k.lrz
File size: 102400
Succeeded in testing 102400 sized mmap for rzip pre-processing
Will take 1 pass
Chunk size: 102400
Byte width: 3
Warning, low memory for chosen compression settings
Succeeded in testing 1526829056 sized malloc for back end compression
Using up to 13 threads to compress up to 102400 bytes each.
Beginning rzip pre-processing phase
hashsize = 4194304. bits = 22. 64MB
0 total hashesunk: 99%
Starting thread 0 to compress 102400 bytes from stream 1
Malloced 22451097600 for checksum ckbuf
Starting thread 1 to compress 13 bytes from stream 0
Starting thread 2 to compress 0 bytes from stream 1
lz4 testing OK for chunk 102400. Compressed size = 0.42% of chunk, 1 Passes
Writing initial chunk bytes value 3 at 24
Writing EOF flag as 1
Writing initial header at 29
Compthread 0 seeking to 17 to store length 3
Compthread 0 seeking to 20 to write header
Thread 0 writing 221 compressed bytes from stream 1
Compthread 0 writing data at 30
Compthread 1 seeking to 7 to store length 3
Compthread 1 seeking to 251 to write header
Thread 1 writing 13 compressed bytes from stream 0
Compthread 1 writing data at 261
Compthread 2 seeking to 27 to store length 3
Compthread 2 seeking to 274 to write header
Thread 2 writing 0 compressed bytes from stream 1
Compthread 2 writing data at 284
MD5: 4c6426ac7ef186464ecbb0d81cbfcb1e
matches=0 match_bytes=0
literals=3 literal_bytes=102400
true_tag_positives=0 false_tag_positives=0
inserts=0 match 0.000
./100k - Compression Ratio: 311.246. Average Compression Speed: 0.000MB/s.
Total time: 00:00:00.16

About lrzip-next, i can't answer. I only use what my package manager (pacman) offers, which is lrzip only

Accidentally & unintendedly closed - reopened

It's in the open_stream_out() function.

 996         /* Use a nominal minimum size should we fail all previous shrinking */
 997         if (limit < STREAM_BUFSIZE) {
 998                 limit = MAX(limit, STREAM_BUFSIZE);
 999                 print_output("Warning, low memory for chosen compression settings\n");
1000         }                    
1001         limit = MIN(limit, chunk_limit);

the variable limit is too low for unknown reason. You have plenty of memory to compress the entire file. lrzn is not included in any package manager.

There is a long standing issue using Level 9 that I have documented. It will always default to a 10MB block size for compression instead of maximizing what each thread can compress. I am guessing if you used -L8 you will not see the error. In fact, in many cases, you will get worse compression using L9 than L8. In lrzip-next open_stream_out() was completely rewritten.

Tested it with all levels available (from 9-1) and the warning appears anyway. The weird thing is that this warning was added recently and the language does not even make sense. If at least it would say "filesize smaller than block size" it would, but as it stands now - it is completely illogical and unnecessary. Regardless, thank you for looking into it

-U sets max memory to the size of the file to be compressed. I am not sure this is wise. See 1013-1014. Honestly, -U offers little benefit other than slowing things down. If your memory exceeds the size of the file, it offers no benefit. It actually has the effect of reducing compression ram. I think MAYBE what was intended was
control->max_chunk = control->max_mmap
not control->st_size
But this is just one man's opinion. Frankly I am considering removing -U altogether in lrzip-next.

[EDIT] To get max possible compression, use one thread, -p1.

rzip.c

1003         /* Optimal use of ram involves using no more than 2/3 of it, so we
1004          * allocate 1/3 of it to the main buffer and use a sliding mmap
1005          * buffer to work on 2/3 ram size, leaving enough ram for the
1006          * compression backends */
1007         control->max_mmap = control->maxram;
1008         round_to_page(&control->max_mmap);
1009 
1010         /* Set maximum chunk size to 2/3 of ram if not unlimited or specified
1011          * by a control window. When it's smaller than the file size, round it
1012          * to page size for efficiency. */
1013         if (UNLIMITED)
1014                 control->max_chunk = control->st_size;
1015         else if (control->window)
1016                 control->max_chunk = control->window * CHUNK_MULTIPLE;
1017         else
1018                 control->max_chunk = control->ramsize / 3 * 2;
1019         control->max_mmap = MIN(control->max_mmap, control->max_chunk);
1020         if (control->max_chunk < control->st_size)
1021                 round_to_page(&control->max_chunk);
1022 
1023         if (!STDIN)
1024                 st->chunk_size = MIN(control->max_chunk, len);
1025         else
1026                 st->chunk_size = control->max_mmap;
1027         if (st->chunk_size < len)
1028                 round_to_page(&st->chunk_size);
1029 

You may try this as a fix, although I still think -U adds no current benefit.

Peter Hyman (1):
      Review Unlimited ram computation.

 rzip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rzip.c b/rzip.c
index 2cf5707..f926d1e 100644
--- a/rzip.c
+++ b/rzip.c
@@ -1011,13 +1011,13 @@ void rzip_fd(rzip_control *control, int fd_in, int fd_out)
         * by a control window. When it's smaller than the file size, round it
         * to page size for efficiency. */
        if (UNLIMITED)
-               control->max_chunk = control->st_size;
+               control->max_chunk = len < control->maxram ? control->maxram : len;
        else if (control->window)
                control->max_chunk = control->window * CHUNK_MULTIPLE;
        else
                control->max_chunk = control->ramsize / 3 * 2;
        control->max_mmap = MIN(control->max_mmap, control->max_chunk);
-       if (control->max_chunk < control->st_size)
+       if (control->max_chunk < len)
                round_to_page(&control->max_chunk);
 
        if (!STDIN)

I've just used lrzip for the first time, and I'm also getting this warning when compressing very small files (under 8 Mb, and even one as small as 262 bytes). Yet a file of 174 Mb doesn't give the warning.

I've just used lrzip for the first time, and I'm also getting this warning when compressing very small files (under 8 Mb, and even one as small as 262 bytes). Yet a file of 174 Mb doesn't give the warning.

See above post. The logic for -U is possibly flawed. This line control->max_chunk = control->st_size; makes the largest possible chunk size equal to the size of the input file. This is not logical when the file size is smaller than available ram! And, especially if you use a ridiculously small file. The LR in lrzip and lrzip-next stands for LONG RANGE zip. The -U only has any value in small systems with very limited ram. Otherwise it adds nothing to compression. Nothing! Even your 174MB file will get the same or better compression without using -U. In the old days, when system ram was often 2GB or less, (well I come from the days when 640KB of ram was a maximum), using a compression window greater than ram MAY have been useful, but SLOW because the incremental compression work was stored to disk -- like a swap. Anyway, my advice is skip -U. See This Discussion on the topic

26,269,832 Jul 7 10:12 180MBfile.lrz
26,269,832 Jul 7 10:12 180MBfile.noU.lrz

@pete4abw — Sorry, I should have said. I didn't use -U. I used only -z.

@pete4abw — Sorry, I should have said. I didn't use -U. I used only -z.

Doesn't matter. The open_stream_out() function makes it happen.

 997         if (limit < STREAM_BUFSIZE) {
 998                 limit = MAX(limit, STREAM_BUFSIZE);
 999                 print_output("Warning, low memory for chosen compression settings\n");
1000         }                    
1001         limit = MIN(limit, chunk_limit);

See this wiki article