pete4abw / lrzip-next

Long Range Zip. Updated and Enhanced version of ckolivas' lrzip project. Lots of new features. Better compression. Actively maintained.

Home Page:https://github.com/pete4abw/lrzip-next

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

:bulb: [Get info about deduplication]

Merculous opened this issue · comments

lrzip-next Version

N/A

Feature Suggestion

If I'm not mistaken, I know ZPAQ (by itself) shows the new size after deduplication. So, maybe we could have a feature where we can know how much data is "removed." This could be a feature where it'll be printed out with everything else, like -v for example, or maybe run it by itself with no compression. I'm not totally sure if its even a good thing to do, but I'm sure it'd be nice to see if deduplication will even get rid of a significant amount. If my assumption is correct, if there's not much to deduplicate, maybe using any method won't make a difference. I have some files that probably don't need compression but could benefit a bit with deduplication, for example, 1-5GB. Hopefully you get my point? Anyway, I love this fork, much more info and better utilization of LZMA dictionaries :P

Steps to reproduce

No response

Relevant log output

No response

Please provide system details

No response

Additional Context

No response

Are you referring to this output at the end of a zpaq run? Or, are you referring to the compression stats at the end of each lrzip-next thread compression run? Please read below and if not what you expect, kindly provide more details as to how to make lrzip-next better. Thanks for the suggestion.

100.00% 0:00:00 + linux-5.13.tar 1429647360 -> 1206160031
100.00% 0:00:00 [17480..17517] 2862797 -method 14,179,1
1 +added, 0 -removed.
 
0.000000 + (1429.647360 -> 1206.160031 -> 231.769323) = 231.769323 MB
14.166 seconds (all OK)

If so, lrzip-next -vv will show that expressed as a compression ratio.

MD5: 134a2942867678fa1c3d4284c8b738b2
matches=110,210 match_bytes=256,918,841
literals=111,632 literal_bytes=1,172,728,519
true_tag_positives=230,217 false_tag_positives=177,446
inserts=667,906 match 0.219
linux-5.13.tar - Compression Ratio: 8.987. Average Compression Speed: 18.671MB/s.
Total time: 00:01:13.46

The first part shows the rzip preprocessing and the second line from the bottom shows the overall compression.

In addition, lrzip-next -vvi will show detailed info at the end.

Summary
=======
File: linux-5.13.tar.lrz
lrzip-next version: 0.8 file

  Stats         Percent       Compressed /   Uncompressed
  -------------------------------------------------------
  Rzip:          82.1%     1,173,834,889 /  1,429,647,360
  Back end:      13.6%       159,083,413 /  1,173,834,889
  Overall:       11.1%       159,083,413 /  1,429,647,360

  Compression Method: rzip + zpaq -- Compression Level = 3, Block Size = 4

  Decompressed file size:  1,429,647,360
  Compressed file size:      159,084,402
  Compression ratio:               8.987x

  MD5 Checksum: 134a2942867678fa1c3d4284c8b738b2

As of the ZPAQ part, yes, I was referring to the end. I had no idea you could use -vi together to get that info, my bad. Right, so I guess that's basically what I was looking for with "Rzip compression: ...". In any case, I guess that'll resolve what I was looking for, I'll just run it with -n and have rzip do its thing. Well, other than that, I guess you pointed out what I happened to miss. Anyway, thanks!

To be honest, can add this into the program where it does rzip with max compression and pipes to /dev/null or something (most likely losing some compression due to the piping, but maybe there's a better way of doing this). You can disregard this as being a feature itself as you've pointed out what I was missing.

Feel free to close this if you believe that there's nothing needs to be added, which probably won't, I think I got it from here.

It's always possible to add a little more info at the end of a compression. There's so much - maybe too much - info already with lrzip-next -v|vv. But I'll keep it in mind. But ITMT, you can use lrzip-next -v|vv -i and it will give you the data and more. Thank you for the feedback.