jonhoo / inferno

A Rust port of FlameGraph

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rayon filtering

benbrittain opened this issue · comments

I recently went to parallelize/optimize a library with rayon, and my flamegraph (unsurprisingly!) became absolutely unreadable. Optionally filtering out rayon symbols/functions would be a super helpful feature

Can you give an example of what it looked like and what filtering you'd like to see?

I've got one here.
flamegraph

You can see that it's totally unreadable. I'd imagine filtering out anything with a rayon symbol and showing functions that one has control over more?

For now I've just defined the parallelism as a feature and have been profiling with that, but that seems less than ideal. I'm not sure what the state-of-the-art in highly-parallel flamegraphs look like though, this might just be infeasible.

+1 on this issue. I have been working with flamegraph for the past few days and needed to go back from parallel iterators to iterators for any code I wanted to profile, because the graph with the rayon feature active is undecipherable.

Also the number of samples collected (I guess the number of stacks?) goes up approximately |CPU-core|-fold, when profiling parallelism with rayon.

@benbrittain Hmm, I get a 404 for that link?

I wonder if you couldn't get quite far by using the new --skip-after flag to filter out the prefix of each stack that's just the rayon threading machinery. Maybe give that a try?

oops, you caught me in the middle of a web site overhaul. link here and updated above: flamegraph

Woah, yeah, that's certainly a right mess!
This doesn't seem like something skip-after can fix, but it's also not clear that the flamegraph here is wrong. The reason for the insanity is that there are recursive calls to conjure::octree::Octree::subdivide, and the flamegraph is "correctly" showing you how long the calls at each "depth" is taking. Which gets pretty crazy as you can see, when you have a deep depth. I don't know exactly what should be displayed here, but there's an argument that a flamegraph isn't quite the right visual indication. For this particular use-case, I think you may want to do a bit of post-processing on the collapsed stack file before passing it to inferno-flamegraph to "collapse" the depths of the tree (if that's indeed what you want). It's such a specific case, that I don't think inferno itself can do much here — rather you should think about exactly what you want the output to show you, and tailor-make something for that for this data structure.

Now, that all said, there is one bit that's weird here which is the [unknown] at the bottom that causes the initial divergence into four stacks. You may want to try enabling forced frame pointers in your build, and see if that helps with that particular problem.