Newrelic RPM Performance Regression
mnelson opened this issue · comments
Issues #45 and #51 have both reported issues when using Makara alongside certain versions of newrelic/rpm. In an attempt to be as transparent as possible as well as potentially isolate the issue, the following table will hold the information required to check your stack against other community reports.
Newrelic has released a fix
The range of newrelic releases affected is 3.7.2 <= X < 3.11.2
Makara Version | AR Version | RPM Version | Regression |
---|---|---|---|
0.0.x | 3.2 | 3.6.9 | no |
0.2.x | 4.x | 3.6.5.130 | no |
0.2.x | 4.x | 3.6.6.147 | no |
0.3.x | 4.x | 3.6.6.147 | no |
BROKE | --- | --- | --- |
0.2.x | 4.x | 3.7.2 | yes |
0.2.x | 4.x | 3.9.4.245 | yes |
0.3.x | 4.x | 3.8.1.221 | yes |
0.3.x | 4.x | 3.9.5.251 | yes |
0.3.x | 4.x | 3.9.6.257 | yes |
0.3.x | 4.x | 3.9.7.266 | yes |
FIXED | --- | --- | --- |
0.x.x | x.x | 3.11.2 | no |
👍 thanks @mnelson !
Trying makara v0.3.0rc3 for the first time with AR 4.1.8 and newrelic_rpm 3.9.7.266, I can confirm performance problems
Another for you
Makara 0.3.x + AR 4.x + RPM 3.9.6.257 = regression yes
Hey there! I'm a developer on newrelic_rpm and got linked to these perf issues today. Major bummer! I'd like to help get to the bottom of it if we can.
I haven't used makara in earnest before, and plugging it into my basic Rails 4 test app didn't show me much perf difference yet. I'm sure I'm missing something in my configuration, though.
Does anyone have a relatively simple duplication case that I could use to dig in? Ideal would be a repo with a Rails app that I could toggle between the versions and see the perf differences, but any indicator where to head would be appreciated.
Here you go! https://github.com/taskrabbit/makara_newrelic_test
makara_newrelic_test [master] ~> be rake bm_nr
-- create_table("users", {:force=>true})
-> 0.0580s
-- initialize_schema_migrations_table()
-> 0.0065s
Rehearsal ------------------------------------------
insert 8.660000 0.200000 8.860000 ( 9.134924)
select 2.600000 0.060000 2.660000 ( 2.768211)
-------------------------------- total: 11.520000sec
user system total real
insert 8.670000 0.150000 8.820000 ( 9.086117)
select 2.520000 0.040000 2.560000 ( 2.667939)
makara_newrelic_test [master] ~> be rake bm
-- create_table("users", {:force=>true})
-> 0.0358s
-- initialize_schema_migrations_table()
-> 0.0014s
Rehearsal ------------------------------------------
insert 0.260000 0.030000 0.290000 ( 0.418320)
select 0.080000 0.010000 0.090000 ( 0.157242)
--------------------------------- total: 0.380000sec
user system total real
insert 0.280000 0.020000 0.300000 ( 0.419354)
select 0.090000 0.010000 0.100000 ( 0.167273)
Let me know if there's anything I can do to help.
#71 will likely be of interest as well.
Based on the rubygem releases I believe the issue will be found between 3.6.8.168
- 3.6.9.171
. 3.6.8.168 runs fine, 3.6.9 takes 40x longer. newrelic/newrelic-ruby-agent@3.6.8.168...3.6.9.171
I'm seeing an improvement in this branch - but it's still 4x slower
makara_newrelic_test [master] ~> be rake bm_nr
-- create_table("users", {:force=>true})
-> 0.0723s
-- initialize_schema_migrations_table()
-> 0.0040s
Rehearsal ------------------------------------------
insert 1.130000 0.080000 1.210000 ( 1.449123)
select 0.490000 0.030000 0.520000 ( 0.615719)
--------------------------------- total: 1.730000sec
user system total real
insert 1.180000 0.050000 1.230000 ( 1.425478)
select 0.370000 0.010000 0.380000 ( 0.471990)
makara_newrelic_test [master] ~> be rake bm
-- create_table("users", {:force=>true})
-> 0.0439s
-- initialize_schema_migrations_table()
-> 0.0017s
Rehearsal ------------------------------------------
insert 0.290000 0.030000 0.320000 ( 0.485325)
select 0.080000 0.010000 0.090000 ( 0.165335)
--------------------------------- total: 0.410000sec
user system total real
insert 0.270000 0.020000 0.290000 ( 0.419044)
select 0.090000 0.010000 0.100000 ( 0.170199)
Thanks @mnelson, that's going to be super helpful in tracking this down! I probably won't be able to dig deep on this until next week, but having those dups will be huge in getting to the bottom of this. Thanks again!
As expected, that repro case was awesome ✨. Thanks again @mnelson for putting it together!
The slowdown is happening at this point in newrelic_rpm
: https://github.com/newrelic/rpm/blob/5560d8ecff0dcece53a80edbc4535cc60fc188d8/lib/new_relic/agent/instrumentation/active_record_subscriber.rb#L87-L93. On Rails 4.x+, New Relic uses ActiveSupport::Notifications
, but the payload only gives us the object_id
of the connection. Because we need more information off the connection (database adapter type for instance), we search through the connection list and that's where the bottleneck is.
Here's a slightly nicer version of the same code from the Ruby agent that you can put in a task or script to run:
n = 1000
Benchmark.bm do |x|
x.report do
n.times do
connections = ::ActiveRecord::Base.connection_handler.connection_pool_list.map do |handler|
handler.connections
end.flatten
connection = connections.detect { |cnxn| cnxn.object_id == 1234 }
end
end
end
When I run this in a Rails project with the mysql
adapter, I get the following timing:
user system total real
0.030000 0.000000 0.030000 ( 0.033124)
Running that with makara instead (doesn't matter if New Relic's in the picture for this), yields this:
user system total real
1.600000 0.030000 1.630000 ( 1.625922)
I haven't dug into makara to see about why this path is slow. Stackprof gives some slightly confusing results, but points to Delegator
and SimpleDelegator
at lot.
FWIW, we don't really like this connection lookup, and are continuing to look for a solution from the Rails end to avoid it. But even if we get an update it'll only be available in some as-yet-unavailable Rails version, so this code is likely to live on even then as a fallback on Rails 4.x.
Any thoughts where to go from here @mnelson?
@jasonrclark Is there an issue with using ObjectSpace._id2ref(object_id)
rather than detecting the object by iterating the connections? I agree that there's a problem somewhere but from my quick attempt an id2ref lookup at least avoids it.
Oh, just remembered I use id2ref for the makara logging as well: https://github.com/taskrabbit/makara/blob/master/lib/makara/logging/subscriber.rb#L25
Btw, the repro is updated with some id2ref rake tasks https://github.com/taskrabbit/makara_newrelic_test
@mnelson There actually are problems with _id2ref
--we had it that way for our original implementation of this code and had to move to the more complex scheme we've got today.
The two issues were that 1) _id2ref
performs really badly on Rubinius and 2) it isn't even supported on JRuby without passing a special startup flag (for similar perf reasons).
Given those constraints, we can't go back to using _id2ref
. With your knowledge of the project, does it seem possible to optimize the connection code? If so, that seems like the best option to me.
Failing that, I've got an idea or two about how we could let you override the connection retrieval. It's a little awkward because of load ordering (I expect makara
will typically be required before newrelic_rpm
), but I know how we could do it.
With a little more digging I've determine that the bottleneck occurs due to the flatten
invocation. My guess is that there's something weird going on with object comparison or array determination due to the delegate. Maybe even down at the C level rb_check_array_type
. I'll keep poking around.
FWIW, this is a potential fix for the NR codebase - even though I understand the real problem is in Makara.
connection = ActiveRecord::Base.connection_handler.connection_pool_list.detect do |l|
l.connections.detect do |c|
c.object_id == 4
end
end
That's a great idea @mnelson. Tested it out, and with a little tweaking (can't use the outer detect
since that returns the connection pool, not the inner connection), I was able to use this instead.
It also neatly avoids a couple extra object allocations, so in addition to helping makara out, it's a small win for every ActiveRecord 4.x client using newrelic_rpm
🎉
This isn't likely to make it in the next release of the agent which is baking internally already for release, but I'll ping here when we have a branch of the agent to test against directly.
Hi guys. I just got into the same issue. What is the workaround while it is not fixed?
With reference to the table above, you can use one of the makara or rpm versions which does not have the performance issue. I have a branch which somewhat improves the situation but it does not fix it completely. You can absolutely give that branch a try but, as I said, it won't completely fix the issue. The underlying perf issue seems to be in ruby's Delegator class and I'm still working on finding a solution which at the very least avoids the problem.
I have tested makara 0.3.x (latest version) with rpm 3.6.6.147 and AR 4.x and it is working fine!
Thanks @danicuki, added to the table.
Hey y'all - we've just released version 3.11.2 of the newrelic_rpm gem, which should address this issue. Thanks again for your help in isolating and fixing it!
Makara Version ---> 0.3.3
AR Version ---> 3.2.14
RPM Version ---> 3.9.7.266
Is this configuration tested ?
@jasonrclark done. Thanks.