Compressing 2.5TB of data, only using ~2% of ram and ~5% of CPU?

Question

Compressing 2.5TB of data, only using ~2% of ram and ~5% of CPU?

8465231 opened this issue 3 years ago · comments

So I am in the process of compressing 2.5TB of source code, decided to give LRzip a try as smaller tests showed significant savings. I tar'ed all the files manually before starting this, resulting in the 2.5TB tar file.

So I setup a temporary server to do the work but it is going much slower then expected.

The server has 2x 8c/16t CPU's and 128GB of memory.

I used these options when starting the compression : lrzip -v -L 9 -U

Looking at the resource usage I am surprised to see it using a mere ~2gb of ram WITH ubuntu GUI running and CPU basically sitting idle most of the time with a few blibs up to ~8%.

On my test machine I actually ran into a lot of issues with it using too much memory and crashing (the test system had 48GB of ram but was testing on a much smaller chuck of data as well). Using the unlimited window option on my test machine netted me similar ram usage as normal default usage.

I would expect it to use a normal window size chuck of ram and then swap out the rest? It seems to be wasting the other 126GB of memory that is available?

Here are some screen shots of the compression process:

Output file size growth

Is this the expected result? Any way to make it faster / use the CPU and memory that is available?

Thanks.

8465231 · Answer 1 · Mon Mar 29 2021 03:16:00 GMT+0800 (China Standard Time)

I canceled the compression by mistake and decided to start it without the -U option to see what it would do for a little while before restarting with the -U.

edit: After letting it run without the -U for awhile it was showing basically the same behavior? only a few GB of memory usage with a window size set to 90GB?

Odd on my test server it acted much differently. My test server was running an ubuntu CLI docker though.

Con Kolivas · Answer 2 · Mon Mar 29 2021 04:57:44 GMT+0800 (China Standard Time)

It will use a smaller window with the -U option and basically continually page in new sections so if you want to use the most ram you are better off without using the -U option. The main thing is that creating the dictionary hash tables takes an extremely long time with an enormous file and lots of ram like that, and there is no way to speed that part up (at present?). I highly recommend not using either the -U option nor increasing the level to 9 as you could be waiting a very long time with such a large file. It will eventually start using more ram if you keep waiting.

8465231 · Answer 3 · Mon Mar 29 2021 06:15:08 GMT+0800 (China Standard Time)

Ok, there are somewhere around 10 million files in the tar file but I didn't think that would effect it since it just sees it as a single file.

So this is expected and even though it said it was 12% complete it was still in the hashing stage I guess? Seeing as even without the -U it used about the same ram and CPU, I am guessing it is working as expected.

I don't mind waiting awhile for the compression, as long as de-compression will not take equally long and can be done on much more mundane hardware.

Am I looking at days to finish compression or weeks given what you can see in the above screen shots and it said it was at 12% after 18 hours?

Con Kolivas · Answer 4 · Mon Mar 29 2021 06:18:54 GMT+0800 (China Standard Time)

It's very non-linear so it's not necessarily proportional to the amount of time it's spent so far, but it would be days, not weeks. It will be hugely faster on decompression provided you haven't used the insanely slow -z option.

8465231 · Answer 5 · Mon Mar 29 2021 06:35:17 GMT+0800 (China Standard Time)

Ok, thanks for the information, this is my first time stepping out of 7zip or rar. In the past rar has actually be my preferred compression. So this is very much an experiment. Thinking about compressing this file with 7z after this to see the difference.

Yeah, just using the default lmza, zpaq was just too stupid slow even for me lol.

I am curious why lmza2 has not been added as an option?

During testing using lrzip -pre-phase only and then compressing with lmza2 after netted me noticeable gains in compression.

Con Kolivas · Answer 6 · Mon Mar 29 2021 06:40:34 GMT+0800 (China Standard Time)

Last I checked lzma2 wasn't as threadable as what's included in lrzip so slowed it down.

8465231 · Answer 7 · Mon Mar 29 2021 06:43:45 GMT+0800 (China Standard Time)

interesting, would not of guessed that.

Is there a way to know when the hashing is finished? Or basically just a waiting game and at some point resource usage should go up?

It was strange how it was using some CPU last night but this morning it was using basically nothing but reading from the drive much faster.

Con Kolivas · Answer 8 · Mon Mar 29 2021 06:50:12 GMT+0800 (China Standard Time)

It's a continuous process so there's no "hashing is finished" in unlimited mode, but secondary compression only happens when a chunk is large enough to work with. As I said, it's very non-linear, and with your unique workload the maximum usage of ram and CPU will only happen for bursts at a time.

8465231 · Answer 9 · Mon Mar 29 2021 06:54:00 GMT+0800 (China Standard Time)

AH, got you. lol

Ok, I will just settle in for the wait lol.

Thnaks for the help.

Peter Hyman · Answer 10 · Mon Mar 29 2021 21:25:32 GMT+0800 (China Standard Time)

I am curious why lmza2 has not been added as an option?

lzma2 is merely a container for an lzma archive. I've looked at this. lrzip already is a container for an lzma archive. The main differences are:

Two-byte properties encoding instead of 5. Dictionary size is now 1 byte instead of 4.
As you noted, supposed better threading.
However, lrzip already has multi-threading and it's superior to what lzma has (as you noticed from the speed).

If I may suggest, the idea of spending so much time tar'ring a huge file, and then compressing the huge file is a waste of disk space, ram, and time. A much better way, is the following:

tar -I 'lrzip [options]' -cf tarfile.tar.lrz tardirs...

This works so well and the heuristically computed compression window is smaller which will yield faster compression. I would also recommend breaking up this huge backup into different projects ore subsirectories.

Another possibility is to use the -m or -w lrzip options and decrease the ram profile so that it will adjust its compression window.

lrzip will take either total ram/3 or total tam/6 for its compression window depending if the program is operating directly on a file or getting piped info.

Try usingtar -I and I think you will get a better result. I strongly advise against -U in all cases. The amount of time lost versus compression gained is not a good trade, IMO.

Good luck.

PS; See this article which despite is bias is an interesting read on why not to use lzma2: xz_is inadequate

8465231 · Answer 11 · Mon Mar 29 2021 21:43:52 GMT+0800 (China Standard Time)

Interesting info, I am still pretty new to linux, been working with windows systems since the 90's though.

I was not aware that lmza2 was just a container, I noticed that it netted a smaller file size so just assumed it was a different algorithm (think it was around 8% smaller in my test).

Being new to linux, I am still in the very early stages of figuring out how piping and the like works. Everytime I think I have it figured out I am proven wrong lol. I will save that command for future reference though. I assume it is different then just using lrztar?

In this case I needed to move the data to another drive that I could put in the garage so tar'ing it was basically the same as coping it.

A big issue I have is that "basic" linux commands are considered so basic no one ever talks about them, making them very hard to find or learn about as a beginner to linux.

eg, I just learned that crtl+z actually pauses a process and it can be restarted with % yesterday while reading something completely unrelated. I always thought crtl-z killed the process. plenty of other examples like that as well lol. Like figuring out that >> amended a file and > replaced it last week.

In this particular case I am specifically testing different compression options for source code as I will be backing up a fair amount of it in the coming months. I have a spare server for the time being as I am in the process of upgrading so I am just letting it churn away in the garage until I sell it.

I was actually wanting to increase ram usage as it has 128GB but is hardly using any of it.

I tried re-starting the compression without the -U option but after an hour it was doing basically exactly the same thing as with the -U, hardly any ram usage for whatever reason.

I am planning on compressing the data again with a another method or 2 to see what works best, I am thinking about using 7z directly as it is more compatible if the size was not that much larger.

Compressing again without -U is an interesting option as well to see the real world effects on ram/CPU usage as well as final size.

It was selecting 90gb window size by default, which is much higher then total ram/3, any idea why?

Peter Hyman · Answer 12 · Mon Mar 29 2021 21:51:55 GMT+0800 (China Standard Time)

Think of it this way. lrzip operates on chunks of data. Depending on Ram available, the chunks are divvied into blocks. Blocks are passed to the backend compressor based on the number of Threads available. When backend returns, the threads are locked until all have returned, and compressed blocks are written back one by one until the chunk is complete. The larger the blocks, the longer the backend will take. The advantage of larger blocks is that more data can be reviewed and hashed by the backend yieding supposed better compression.

I've done a lot of research on this (see my git), and it turns out that levels 4 and 6 work the best from a time and data perspective.

As for linux and its commands, well, that's a discussion to take offline.

Peter Hyman · Answer 13 · Mon Mar 29 2021 22:00:29 GMT+0800 (China Standard Time)

Just out of curiosity, start and immediately cancel a compression with lrzip -vv option and capture the output and post it. I am interested in the compression windows derived and other memory stats.

8465231 · Answer 14 · Mon Mar 29 2021 22:16:59 GMT+0800 (China Standard Time)

yeah, the linux commands was just a rant lol. Although -vv is another very nice command I was not aware of lol.

Interesting, I will try that tar command next time.

Here is the output from a new instance I started since I didn't want to stop the running one:

The following options are in effect for this COMPRESSION.
Threading is ENABLED. Number of CPUs detected: 32
Detected 135154196480 bytes ram
Compression level 7
Nice Value: 19
Show Progress
Max Verbose
Output Directory Specified: /mnt/4TB-SAS/test/
Temporary Directory set as: ./
Compression mode is: LZMA. LZO Compressibility testing enabled
Heuristically Computed Compression Window: 859 = 85900MB
Storage time in seconds 1374636788
Output filename is: /mnt/4TB-SAS/test/LineageOS.tar.lrz
File size: 2754531338240
Enabling sliding mmap mode and using mmap of 45051396096 bytes with window of 90102796288 bytes
Succeeded in testing 45051396096 sized mmap for rzip pre-processing
Will take 31 passes
Chunk size: 90102796288
Byte width: 5
Succeeded in testing 25778382165 sized malloc for back end compression
Using up to 33 threads to compress up to 584030808 bytes each.
Beginning rzip pre-processing phase
hashsize = 4194304. bits = 22. 64MB
Starting sweep for mask 1
Starting sweep for mask 3
Starting sweep for mask 7
Starting sweep for mask 15
Starting sweep for mask 31
etc

Peter Hyman · Answer 15 · Mon Mar 29 2021 22:34:13 GMT+0800 (China Standard Time)

So, you see, the chunk size is 2/3 of the total ram. This is as expected.
135154196480 * 2/3 = 90102796288
90102796288 / 2 = 45051396096 is for the rzip pre-processor and sliding map.
The other half of remaining ram is for the backend varying due to lzma overhead computed in util.c.

Each block is not exactly 1/33 of 25778382165 which is close to what is expected.

Like I said, you are welcome to contact me offline for more info.

8465231 · Answer 16 · Mon Mar 29 2021 22:44:02 GMT+0800 (China Standard Time)

I see that now, with only the -v command it only showed the :

Heuristically Computed Compression Window: 859 = 85900MB

So I assumed that was the block size.

So what would be the bottleneck in my case that would have it only using ~2-3gb of ram and 2-5% of CPU?

Hard drive speed? It does seem to be pegged at 100% utilization, although the speeds are a bit slower then I would expect for that, it must be doing some random reads.

Random reads also explains why at this speed it should of been able to read almost the whole 2.5tb by now but it says it is only 12% complete.

Peter Hyman · Answer 17 · Mon Mar 29 2021 23:06:24 GMT+0800 (China Standard Time)

-v is not -vv. Compression window is in hundreds of MB, so 859 = ~85GB. As I said, I'm happy to discuss the internals, but this is not the place. See my git and contact me there. Thank you

8465231 · Answer 18 · Mon Mar 29 2021 23:14:12 GMT+0800 (China Standard Time)

I have to admit, I just signed up for a git account a few weeks ago and have no idea how to contact someone directly lol. Sorry if this is not how github is supposed to be used.

I will give contacting you directly a shot later today though.

Peter Hyman · Answer 19 · Mon Mar 29 2021 23:19:28 GMT+0800 (China Standard Time)

Just click the name. It links to the git.