Flamegraph generation memory usage is quite high for large inputs

Question

Flamegraph generation memory usage is quite high for large inputs

itamarst opened this issue 4 years ago · comments

Itamar Turner-Trauring commented 4 years ago

Memory usage for inferno-flamegraph, and the equivalent Rust API usage, is proportional to input file size. A 44MB file results in 60MB memory usage for me, a 3KB input file results in 3MB (presumably the minimum).

I discovered this when processing a 440MB file, which resulted in hundreds of MB RAM usage, which is embarrassing when one is implementing a memory profiler 😁 So now I'm prefiltering out tiny irrelevant frames, which is why it's 44MB and not 440MB. Still, less memory usage would be nice.

Now, the output file is typically more like a 1 megabyte or less, because all those repeating frames in the input file get combined into a graph in the output. So it ought to be possible to reduce memory usage quite a lot in the internal representation as well.

My completely unverified guess as to the problem: my input files have quite long strings for frame names, and multiple copies of each string are being stored in memory when the data structures are built up. If this is the case, usage of a string interner in the right place might be quite helpful, and potentially even speed up runtime because more data would fit in the CPU memory caches.

Jon Gjengset · Answer 1 · Mon Dec 07 2020 03:38:49 GMT+0800 (China Standard Time)

That is interesting indeed. Are you seeing this problem with inferno-flamegraph or inferno-stack-collapse? If it is indeed the former, my guess is that this comes from the need to sort the input before processing it. It's an unfortunate property of the algorithm it uses to merge stack frames that it requires the input to be sorted, which means reading it all into memory and then sorting the lines. If your input is already sorted, you could try the --no-sort flag which assumes the input is already sorted, and therefore should be able to avoid reading it all into memory.

As far as the blow-up goes, a 1.4x blowup is unfortunate, though surprisingly good given little mind has been paid to optimizing memory use. That is, given that the input file has to be in memory to sort it! I'm a little strapped for time, but if you want to do some digging, this article has some good tips on profiling memory use in Rust!

Itamar Turner-Trauring · Answer 2 · Mon Dec 07 2020 06:32:31 GMT+0800 (China Standard Time)

Pre-sorting might help for me, yeah. But—

Consider an input that looks like this:

A;B;C 123
A;B;D 345

The strings A and B repeat. In fact, most of the memory usage of loaded lines will be repeated. Which is what makes me think you can not just go from 1.4× to 1×, you can plausibly go to 0.1× or even better.

Jon Gjengset · Answer 3 · Tue Dec 08 2020 12:49:56 GMT+0800 (China Standard Time)

That's a neat idea. I wonder how well it'll turn out in practice though. Currently we store a single string A;B;C and a single string A;B;D, but with the proposed change we'd store two vectors each holding three (interned) strings. Assuming interned strings are stored as a u32, that's:

 2x String (8b pointer, 8b length, 8b capacity) + "A;B;C" + "A;B;D" = 58b

versus

2xVec (8b pointer, 8b length, 8b capacity) + 4x distinct interned Strings (24b) + 6u32s + "A"+"B"+"C"+"D" = 172b

Of course the benefits add up the more strings there are, but if the strings are generally short (like main) it probably doesn't end up buying that much. It'd be super interesting to see experiments on this on some real traces!

Itamar Turner-Trauring · Answer 4 · Thu Mar 18 2021 01:27:11 GMT+0800 (China Standard Time)

Going to try to do this.

Itamar Turner-Trauring · Answer 5 · Thu Mar 18 2021 01:30:02 GMT+0800 (China Standard Time)

Or at least, try to find some way to reduce memory usage, I'm getting hundreds of MB in memory use.

Itamar Turner-Trauring · Answer 6 · Thu Mar 18 2021 01:35:22 GMT+0800 (China Standard Time)

It's possible that it's easier for me to just do this on my side, where each unique frame text is mapped to a unique unicode character, and then I search and replace on the resulting SVG. Which feels terrible, but is plausibly less work. Will think about it some more as I read code.

Itamar Turner-Trauring · Answer 7 · Thu Mar 18 2021 02:04:23 GMT+0800 (China Standard Time)

After further thought, going to go back and see why my files are so big, the input file sizes do seem excessive even for a worst-case scenario.