:bulb: [Get info about deduplication]
Merculous opened this issue · comments
lrzip-next Version
N/A
Feature Suggestion
If I'm not mistaken, I know ZPAQ (by itself) shows the new size after deduplication. So, maybe we could have a feature where we can know how much data is "removed." This could be a feature where it'll be printed out with everything else, like -v for example, or maybe run it by itself with no compression. I'm not totally sure if its even a good thing to do, but I'm sure it'd be nice to see if deduplication will even get rid of a significant amount. If my assumption is correct, if there's not much to deduplicate, maybe using any method won't make a difference. I have some files that probably don't need compression but could benefit a bit with deduplication, for example, 1-5GB. Hopefully you get my point? Anyway, I love this fork, much more info and better utilization of LZMA dictionaries :P
Steps to reproduce
No response
Relevant log output
No response
Please provide system details
No response
Additional Context
No response
Are you referring to this output at the end of a zpaq
run? Or, are you referring to the compression stats at the end of each lrzip-next
thread compression run? Please read below and if not what you expect, kindly provide more details as to how to make lrzip-next
better. Thanks for the suggestion.
100.00% 0:00:00 + linux-5.13.tar 1429647360 -> 1206160031
100.00% 0:00:00 [17480..17517] 2862797 -method 14,179,1
1 +added, 0 -removed.
0.000000 + (1429.647360 -> 1206.160031 -> 231.769323) = 231.769323 MB
14.166 seconds (all OK)
If so, lrzip-next -vv
will show that expressed as a compression ratio.
MD5: 134a2942867678fa1c3d4284c8b738b2
matches=110,210 match_bytes=256,918,841
literals=111,632 literal_bytes=1,172,728,519
true_tag_positives=230,217 false_tag_positives=177,446
inserts=667,906 match 0.219
linux-5.13.tar - Compression Ratio: 8.987. Average Compression Speed: 18.671MB/s.
Total time: 00:01:13.46
The first part shows the rzip
preprocessing and the second line from the bottom shows the overall compression.
In addition, lrzip-next -vvi
will show detailed info at the end.
Summary
=======
File: linux-5.13.tar.lrz
lrzip-next version: 0.8 file
Stats Percent Compressed / Uncompressed
-------------------------------------------------------
Rzip: 82.1% 1,173,834,889 / 1,429,647,360
Back end: 13.6% 159,083,413 / 1,173,834,889
Overall: 11.1% 159,083,413 / 1,429,647,360
Compression Method: rzip + zpaq -- Compression Level = 3, Block Size = 4
Decompressed file size: 1,429,647,360
Compressed file size: 159,084,402
Compression ratio: 8.987x
MD5 Checksum: 134a2942867678fa1c3d4284c8b738b2
As of the ZPAQ part, yes, I was referring to the end. I had no idea you could use -vi together to get that info, my bad. Right, so I guess that's basically what I was looking for with "Rzip compression: ...". In any case, I guess that'll resolve what I was looking for, I'll just run it with -n and have rzip do its thing. Well, other than that, I guess you pointed out what I happened to miss. Anyway, thanks!
To be honest, can add this into the program where it does rzip with max compression and pipes to /dev/null or something (most likely losing some compression due to the piping, but maybe there's a better way of doing this). You can disregard this as being a feature itself as you've pointed out what I was missing.
Feel free to close this if you believe that there's nothing needs to be added, which probably won't, I think I got it from here.
It's always possible to add a little more info at the end of a compression. There's so much - maybe too much - info already with lrzip-next -v|vv. But I'll keep it in mind. But ITMT, you can use lrzip-next -v|vv -i
and it will give you the data and more. Thank you for the feedback.