Solver.solve throws stack level too deep error when run in a web server on a mac

Question

Solver.solve throws stack level too deep error when run in a web server on a mac

johnnymo87 opened this issue 2 years ago · comments

Jonathan Mohrbacher commented 2 years ago

Hi @ankane, first of all, thanks for the great work you're doing here and elsewhere to improve the machine learning ecosystem for ruby.

I'm having a problem where my code using this gem throws a stack level too deep error on solver.solve when I run it in a web server on a Mac. I've set up a GitHub repo to demonstrate.

$ ruby run_in_web_server.rb

... server starts ...
... I visit localhost:4567 ...

2022-02-07 12:26:26 - SystemStackError - stack level too deep:
        /Users/jonathan.mohrbacher/Code/stack-overflow/lib/demo/solver.rb:142:in `solve'
        /Users/jonathan.mohrbacher/Code/stack-overflow/lib/demo/solver.rb:142:in `run_solver'
        /Users/jonathan.mohrbacher/Code/stack-overflow/lib/demo/solver.rb:43:in `solve'
        /Users/jonathan.mohrbacher/Code/stack-overflow/lib/demo/solver.rb:23:in `solve'
        /Users/jonathan.mohrbacher/Code/stack-overflow/lib/demo.rb:18:in `run'
        run_in_web_server.rb:10:in `block in <main>'
        ... sinatra-specific stack trace ...

I've tried the following experiments:

I ran it on a Mac (M1), didn't work ❌
I ran it on a Mac (M1) with ruby 2.7 rather than ruby 3.1, didn't work ❌
I ran it on a Mac (M1) with thin instead of puma, and then with webrick instead of puma, didn't work ❌
I ran it on a Mac (Intel), didn't work ❌
I ran it on a Mac (M1) in a linux/arm64 docker container, did work ✅
I ran it on Ubuntu (Intel), did work ✅
I ran it on Ubuntu (Intel) in a linux/x86_64 docker container, did work ✅
I ran it without a web server (ruby run_in_terminal.rb) on all three of these computers, did work ✅

So I can only reproduce this problem when running outside of docker on a Mac, and only when running it with a web server. On these Macs, I installed or-tools with Homebrew.

When I hit the error on the Mac with the Intel chip, it seems like ruby crashed because it gave me a large crash report, attached below.

crash.txt
ruby-2022-02-08-000648.txt

Andrew Kane · Answer 1 · Wed Feb 09 2022 04:36:29 GMT+0800 (China Standard Time)

Hey @johnnymo87, thanks for reporting! This sounds similar to #21. Happy to dig into it if you can make the repro script more minimal. For instance, minimal dependencies:

source "https://rubygems.org"

gem "sinatra"
gem "or-tools"
gem "puma"

And Ruby code and data in a single file:

require "bundler/setup"
Bundler.require

get "/" do
  # code to repro
end

I tried with the bin packing test case, but wasn't able to reproduce on an Intel Mac.

Jonathan Mohrbacher · Answer 2 · Wed Feb 09 2022 06:47:41 GMT+0800 (China Standard Time)

Yes, happy to do so, here's a shorter version.

Dependencies

source 'https://rubygems.org'

gem 'or-tools', '~> 0.6.0'
gem 'puma', '~> 5.6'
gem 'sinatra', '~> 2.1'

Ruby (3.1.0) code and data in a single file

require 'bundler/setup'
Bundler.require

get '/' do
  bins = [
    ['Bin 0', 3000, 1373],
    ['Bin 1', 1768, 633],
    ['Bin 2', 1000, 1028],
    ['Bin 3', 886, 633],
    ['Bin 4', 1660, 1028],
    ['Bin 5', 3000, 3000],
    ['Bin 6', 3000, 1373],
    ['Bin 7', 3000, 1028],
    ['Bin 8', 3000, 633],
    ['Bin 9', 2242, 633]
  ].map { |name, volume, cost| { name:, volume:, cost: } }

  items = [
    ['Item 0', 656],
    ['Item 1', 431],
    ['Item 2', 721],
    ['Item 3', 849],
    ['Item 4', 117],
    ['Item 5', 940],
    ['Item 6', 1302],
    ['Item 7', 547],
    ['Item 8', 645],
    ['Item 9', 757]
  ].map { |name, volume| { name:, volume: } }

  solver = ORTools::Solver.new('demo', :cbc)

  # Variables
  # item_in_bin_vars[item][bin]
  # * 1 if an item is packed in a bin.
  # * 0 otherwise.
  item_in_bin_vars = items.each_with_object({}) do |item, item_hash|
    item_hash[item[:name]] = bins.each_with_object({}) do |bin, bin_hash|
      bin_hash[bin[:name]] =
        solver.int_var(0, 1, [item[:name], bin[:name]].join(' in '))
    end
  end

  # Variables
  # bin_vars[bin]
  # * 0 <= number of times bin is used <= 1
  bin_vars = bins.each_with_object({}) do |bin, hash|
    hash[bin[:name]] = solver.int_var(0, 1, bin[:name])
  end

  # Constraints
  # Each item must be in exactly one bin.
  # This constraint is set by requiring that the sum of item_in_bin_vars[item, bin]
  # over all bins is equal to 1.
  items.each do |item|
    solver.add(solver.sum(item_in_bin_vars.fetch(item[:name]).values) == 1)
  end

  # Constraints
  # The amount packed into each bin cannot exceed its capacity.
  #
  # Why use multiplication? Because our bin variables' values are either 1 or
  # 0, based on whether or not they're used. For example, let's say we have a
  # bin whose volume is 1000.
  #
  # If the bin is used, then we constrain
  #   the sum of its items' volumes to <= (1 * 1000).
  # Otherwise, we constrain
  #   the sum of its items' volumes to <= (0 * 1000).
  bins.each do |bin|
    solver.add(
      solver.sum(
        items.map do |item|
          item_in_bin_vars.fetch(item[:name]).fetch(bin[:name]) * item[:volume]
        end
      ) <=
      bin_vars.fetch(bin[:name]) * bin[:volume]
    )
  end

  # In order to minimize the number of bins while ALSO minimizing other
  # factors, I'm using a bin_count_penalty that is as large as the largest
  # value of the largest of the other two factors.
  bin_count_penalty = [bins.map { _1[:cost] }.max, bins.map { _1[:volume] }.max].max

  # Define what we're trying to minimize:
  # * Number of boxes
  # * Total cost
  # * Total volume
  solver.minimize(
    solver.sum(
      bins.map { |bin| bin_vars.fetch(bin[:name]) * bin_count_penalty } +
      bins.map { |bin| bin_vars.fetch(bin[:name]) * bin[:cost] } +
      bins.map { |bin| bin_vars.fetch(bin[:name]) * bin[:volume] }
    )
  )

  # Attempt to solve the problem. Return a list of bins chosen by the solver,
  # each with their chosen items inside. If a solution cannot be found, abort.
  status = solver.solve
  raise 'Huh?' unless status == :optimal

  bins
    .flat_map do |bin|
      next if bin_vars.fetch(bin[:name]).solution_value.to_i.zero?

      items_in_bin = items.reject do |item|
        item_in_bin_vars.fetch(item[:name]).fetch(bin[:name]).solution_value.zero?
      end

      bin.to_h.merge(items: items_in_bin.map(&:to_h))
    end
    .compact
    .map(&:to_h)
    .to_s
end

Even though I'm trying to solve a bin packing problem, I feel that the code I wrote is a little closer to your MIP assignment test case? At least in terms of what methods I call on solver, e.g. #minimize, #sum, etc. On the other hand, that code doesn't reproduce the stack level too deep problem for me.

Andrew Kane · Answer 3 · Wed Feb 09 2022 08:01:36 GMT+0800 (China Standard Time)

Great, the script reproduces a crash for me. Will see what's going on.

Andrew Kane · Answer 4 · Wed Feb 09 2022 09:50:40 GMT+0800 (China Standard Time)

Hey @johnnymo87, should be fixed in the commit above. Linear expressions are now built in Ruby (like Python) instead of C++. Still need to make the to_s output friendly, but this should take care of the crashes.

Jonathan Mohrbacher · Answer 5 · Wed Feb 09 2022 11:16:08 GMT+0800 (China Standard Time)

I'll check it out, thank you @ankane!

pmcnano · Answer 6 · Thu Feb 10 2022 08:54:38 GMT+0800 (China Standard Time)

@ankane didn't solve the issues for me on M1.

edit: tested with @johnnymo87 's repos, and same thing.

Jonathan Mohrbacher · Answer 7 · Thu Feb 10 2022 23:11:04 GMT+0800 (China Standard Time)

Hmm yes, I can confirm that the same is true for me.

I set my Gemfile to pull the commit like so:

gem 'or-tools', git: 'git://github.com/ankane/or-tools-ruby.git', ref: '260f623'

And then I reran the script and I still hit the stack level too deep error.

Andrew Kane · Answer 8 · Fri Feb 11 2022 09:13:27 GMT+0800 (China Standard Time)

Thanks for checking.

I was able to produce another crash when using Homebrew OR-Tools (instead of the OR-Tools download). I have a feeling it's something with how the native extension is linking to the shared libraries. Does it work if you switch to the GLOP solver (ORTools::Solver.new('demo', :glop))?

Edit: Also, make sure you're on the latest OR-Tools (brew update && brew upgrade or-tools).

pmcnano · Answer 9 · Fri Feb 11 2022 09:16:48 GMT+0800 (China Standard Time)

It does not crash with :glop

Andrew Kane · Answer 10 · Fri Feb 11 2022 10:52:34 GMT+0800 (China Standard Time)

Thanks, from what I can tell, it happens with a combination of:

Homebrew OR-Tools (and CBC, etc)
The CBC solver
A threaded environment (Thread.new { }.join)

For now, it's probably easiest to use another solver like GLOP. Hopefully OR-Tools will provide a binary installation for Mac ARM in the future, which should solve it.

Edit: Think the issue may be Homebrew CBC may not be compiled with CBC_THREAD_SAFE - looking into it more

Andrew Kane · Answer 11 · Fri Feb 11 2022 11:56:22 GMT+0800 (China Standard Time)

It looks like CBC will be thread-safe by default in the future: coin-or/Cbc#465

There should be a way to build it in a thread-safe way right like OR-Tools does (with --enable-cbc-parallel and -DCBC_THREAD_SAFE), but still seems to crash when I try it.

Ref: coin-or/Cbc#332

pmcnano · Answer 12 · Fri Feb 11 2022 12:05:18 GMT+0800 (China Standard Time)

Interestingly enough, back on Intel, when I had the issue (which I don't with the last version), doing my solving INSIDE a thread solved it. ha!