ruby / did_you_mean

The gem that has been saving people from typos since 2014

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

exponential growth in did-you-mean based on object graph depth * width

zenspider opened this issue · comments

I’m seeing exponential growth in time when generating message. Reproduction and timings here:

https://gist.github.com/zenspider/4fbf1a5d237095c52184a57c86cdb5eb

Thanks for reporting. I was actually able to reproduce this issue without did_you_mean:

diff --git a/trip_dispatcher.rb b/trip_dispatcher.rb
index 09ddc8e..56166ec 100644
--- a/trip_dispatcher.rb
+++ b/trip_dispatcher.rb
@@ -35,7 +35,7 @@ end
   p width
   (10..50).step(10).each do |depth|
     t0 = Time.now
-    Build.new(depth, width).go rescue $!.message # HACK: this MUST be accessed
+    Build.new(depth, width).inspect
     puts "%3d = %7.3f" % [depth, (Time.now - t0)]
   end
   puts

Alternatively, you could also use the --disable-did_you_mean option to disable it:

$ ruby --disable-did_you_mean trip_dispatcher.rb

The issue is that there is a complex circular reference between Graph and Node, and the #inspect method on the Graph object ends up creating an extremely large string object. #inspect is smart enough to handle simple circular references, but in this particular case, it's not quite working as expected.

What is unexpected here though is that DYM is making the situation worse by calling the #message method twice. This is a bit trycky to deal with as DYM's #to_s method would have to pass the message object around to avoid extra memory allocation.

Apparently, JRuby handles this better than MRI but crashes once it runs out of memory. I'll look to see if the C function that gets invoked under the hood could be improved.

@yuki24 did you find any solution for this problem?

Unfortunately not. It's a bit hard to detect a circular reference in DYM and I'm not sure if there's anything we can do other than making DYM faster (which is also hard at this point).

I don't think that is a problem o DYM though, we can reproduce that without it just by using inspect. I was just wondering if you find the source of the problem at least.

I think I'm running into this same issue with an object graph that contains a lot of back-references:

# Breakpoint set just before calling trace.message on a NameError
trace.receiver.inspect.length
# five minutes later...
# => 3542020365
# 3.54GB of text!

This is the line of code that hangs when it calls trace.message: https://github.com/mike-bourgeous/mb-util/blob/891225432f51e9448a639e3a7d1f6af04ba9b187/lib/mb/util/text_methods.rb#L83

I'm going to try working around the issue by defining a simpler #inspect on all of my classes.

Obviously this isn't the fault of DoYouMean, but does DYM need to use inspect, or could it use another approach, e.g. concatenating instance_variables and methods and so on?

Should we file an issue with Ruby to get an implementation of inspect that doesn't explode like this?

Also, could a means be provided for bypassing DYM at runtime?

Ruby's #inspect is already smart enough to detect circular references:

$ ruby -v
ruby 2.7.3p183 (2021-04-05 revision 6847ee089d) [arm64-darwin20]
a = {}
a[0] = a
a.inspect # => "{0=>{...}}"

class Foo
  attr_accessor :bar
end

foo = Foo.new
foo.bar = foo
foo.inspect # =>"#<Foo:0x000000014a37f668 @bar=#<Foo:0x000000014a37f668 ...>>"

The only places we use #inspect are in:

def corrections
@corrections ||= exact_matches.empty? ? SpellChecker.new(dictionary: @keys).correct(@key).map(&:inspect) : exact_matches
end
private
def exact_matches
@exact_matches ||= @keys.select { |word| @key == word.to_s }.map(&:inspect)
end

which I do not believe is as common as NameError or NoMethodError. As noted above, there is very little we can do to optimize this behaviour.

I just think that if you already know your objects have circular or back references, it would be your responsibility to make sure the #inspect method wouldn't explode because of the object references.