ruby-prof / ruby-prof

A ruby profiler. See https://ruby-prof.github.io for more information.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1.0 has bugged callstack profiler output

nateberkopec opened this issue · comments

To reproduce, run a callstack w/ 0.18 and against 1.0.

I'll use puma as an example. Clone down puma, bundle install, and then ruby-prof -pcall_stack test/test_app_status.rb.

Output on 0.18:
Screen Shot 2019-08-02 at 8 13 17 AM

Output on 1.0:
Screen Shot 2019-08-02 at 8 11 10 AM

Lots of nonsensical numbers on each stacktrace line for "% child" and "% parent".

You're going to see a slightly diff profile than me b/c I'm messing with stuff in Minitest and Kernel#require.

In any case, thank you so much for releasing a 1.0! Glad to see ruby-prof getting some love.

Ah yeah, that does look wrong. Will take a look - thanks for the report.

So I took a look at this, and its caused by recursive methods. ruby-prof 1.0.0. really does handle recursive methods better - the reported values in the flat and graph printouts now show the correct time. But those are aggregated values. While ruby-prof does track how much time each call trace through a method takes, it doesn't keep enough info around to correctly figure out recursive calls. I'm actually surprised this worked at all in previous versions, I have to go back to the old code and see why it was better (I'm guessing it was pure luck, but need to verify that).

@cfis I'm really curious as to why you think the correctness of the call stack display is related to the detection of recursive calls. The call info objects connected with the child/parent relationships should form a real tree, not a graph. I'm 100% certain that this was the case in version 0.17 and I'm also certain that the percentages and the call numbers were correctly calculated in that version.

So in my opinion this is a bug you introduced.

@cfis I studied the new code a bit. It seems the prof_call_info_t structs no longer from a tree. They are instead somehow linked to prof_method_t structs, which store all children / all parents of a given method across all calls over the whole profiling run. This makes it impossible to produce a proper call stack printout. And I bet it also makes a few other calculations incorrect.

I have the strong feeling that ruby-prof is fundamentally broken a.t.m. How should we proceed here?

Hi @skaes - actually I believe it was fundamentally broken before on any type of recursive calls. There were a couple hacks in place, but they broke down quickly in most situations (numbers would get double counted etc.). Right now I think the numbers are correct, but we don't store enough information to show each individual trace. That is versus in 0.17 where the numbers were incorrect but we could sort of kind of show traces.

Anyway, I need to go read the code again and get familiar with it to be more specific. Then we can discuss how to proceed.

@cfis I would very much appreciate If you could provide an example of wrong statistics for 0.17 and explain why the numbers are wrong. I tried to explain why the current implementation can never provide correct call stack print outs and why I think the old one did. But I may be wrong, so very curious about an actual response to my arguments.

BTW, the code I added was based on a short theoretical assessment I wrote at the time I consulted a company on a different profiler. The call stack visualization was derived from their product. Maybe there's a flaw in this assessment as well. You can read it here: https://railsexpress.de/papers/callgraph.pdf

Thanks for the link to the paper. I've spent some time getting back up to speed with the old and new code base, so can comment intelligently.

First some background. I was inspired to update ruby-prof over the summer because I was trying to figure out why upgrading our Rails app to Rails 6 made it a lot slower. I tried ruby-prof of course but it just couldn't handle it - specifically generating reports would take tens of minutes. So I tried the sampling profilers but the didn't reveal the issue. So I went back to ruby-prof and profiled it. The slowness in the reports was caused by having to calculate the times for each method while on the fly - which involved a lot of iterating call infos over and over (if I remember correctly the aggregate class was particularly awful). I wanted to fix that, and while at it get rid of all the old dead code (the gc patches from 10 years ago for example), modernize the gem, add the ability to serialize/deserialize results for later analysis, etc. I also wanted to make profiling itself faster, since that was fairly slow too (although not nearly the issue as the reports). Once I was done, I could profile our rails app (so ruby-prof was 10x faster or more even) and found the issue.

To fix the report slowness, I changed the code to calculate method times while profiling instead of afterwards. That's done in the stack code and is the same logic that was (and is still used) for updating call info times. I think the current reported times are correct, they do take recursion into account, and they do match version 0.17 times. I definitely don't want to undo any of that - going back to the old way makes ruby-prof unusable.

Having looked at the code, I take back what I said in the last post. From what I see, the 0.17 times for recursive traces do look right to me and I went back and traced through ruby-prof line by line to verify. So my mistake on that.

Next, you are correct in noting that ruby-prof now maintains a graph and not a tree, and that is why in recursive cases the call stack printer doesn't work (it works in no-recursive cases). I didn't realize that until the bug report above, and I agree going back to a tree would solve it. Or maybe there is some other way?

Going back to a tree would involve:

  • Delete parent_call_infos from prof_method_t. No need to add back to prof_call_info_t because it already includes a parent (same as 0.17).

  • Moving child_call_infos from prof_method_t back to prof_call_info_t and updating the rest of the code appropriately (the key code is in ruby_prof.c but some of the method accessors would need updating too). So that's basically reverting that code to what 0.17 did.

The downside is it would require more memory at runtime and its only purpose would be to support the call stack report since all other reports are fine from what I can tell. But I do think that's important, and they used to work, so the should continue to work.

I'll see if I can have a go at making those changes.

The dev branch has a mostly working implementation of the C code which does what 0.17 did. The reports aren't right though because they show unaagregated parents/children. And for a similar reason the tests won't pass (thus "mostly" working). So this is just mostly to show what it would look like. See 3e765cf.

But like I said above, the aggregation code makes ruby-prof too slow to use when tracing things like Rails. So I don't think this is viable solution as-is. Either the aggregation code needs to move into C (unlike before), or we need a more clever data structure (an already aggregated one but with some extra information to deal with keep stack of recursive call stacks), or some other solution.

Thanks @cfis for coming up with a quick proposal. Unfortunately, it segfaults when running the tests (https://travis-ci.org/ruby-prof/ruby-prof/builds/631493742).

Please see also this issue: #264

You wrote above that the call_stack printer works on non-recursive cases for the 1.x releases. That's not true. Whether or not a call_info was detected to be recursive is completely irrelevant to calculating the correct parent/child percentages. In order to report the percentages you just need the total_time of each call_info. Going back to a tree is the only way to do this correctly.

I can't right now comment on the correctness of the other reports, as I don't have a lot of time to go full in on the code. I might get to it the coming weekend. I'd like to focus on the most pressing problem at hand, which is the broken call_stack printer.

But what I really meant by "figure out how to proceed" was for us to find a way how to work together on the project, so that the kind of breakage that came with the 1.x release can hopefully be avoided in the future.

One option going forward would be using the C4 process which seems to work really well for the ZeroMQ community. Or we could design our own process.

If you agree, we should probably continue this discussion someplace else.

Yeah - the patch is a prototype to show the rough extent of the changes. It definitely needs more work.

As for the times, the parent/self/child percentages are calculated while profiling. 1.0 is doing the aggregation on the fly versus earlier versions which do it afterwards while generating reports (see the Ruby aggregate classes). As far as I know the 1.0 times are correct - the tests pass and if you look at the recursive test cases via the graph and graph_html reports the times shown are correct. I did spend a lot of time on verifying them. Of course I could be wrong - but I would need to see an example where the times are incorrect.

But since aggregate data is stored, information is lost about each individual tree. And that breaks the call stack report since it needs that information since what it does is show each of those trees. The upside is 1.0 is much, much faster on generating reports.

@cfis how can this issue be closed when the new code hasn't been released yet and it doesn't even compile?

No it was intentional - fixed code is in dev branch. Not ready for release yet, but that's due to other issues. I consider this one fixed.

Changes are now merged into master branch.