Race condition when inserting records
fschwahn opened this issue · comments
When inserting several records simultaneously the following error is sometimes raised:
NoMethodError: undefined method `rank' for nil:NilClass
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model/ranker.rb line 194 in rearrange_ranks
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model/ranker.rb line 185 in assure_unique_position
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model/ranker.rb line 61 in handle_ranking
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model.rb line 33 in block in handle_ranking
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model.rb line 32 in each
/app/vendor/bundle/ruby/3.0.0/gems/ranked-model-0.4.7/lib/ranked-model.rb line 32 in handle_ranking
I was able to somewhat reliably reproduce this error with the following test:
begin
require "bundler/inline"
rescue LoadError => e
$stderr.puts "Bundler version 1.10 or later is required. Please update your Bundler"
raise e
end
gemfile(true) do
source "https://rubygems.org"
gem "activerecord", "~> 6.1"
gem "ranked-model"
gem "pg"
end
require "active_record"
require "minitest/autorun"
require "logger"
ActiveRecord::Base.establish_connection(adapter: "postgresql", database: "ranked_model_issue", url: "postgres://postgres:@localhost:5434")
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Schema.define do
create_table :ducks, force: true do |t|
t.string :name
t.integer :row_order
t.timestamps
end
end
class Duck < ActiveRecord::Base
include RankedModel
ranks :row_order
end
class BugTest < Minitest::Test
def test_error_during_rebalance
threads = 5.times.map do
Thread.new do
ActiveRecord::Base.connection_pool.with_connection do
Duck.create!
end
end
end
threads.each(&:join)
assert_equal Duck.count, 5
# Secondary issue
# assert_equal Duck.distinct.pluck(:row_order).size, 5
end
end
- This does not work with sqlite, as sqlite raises an error due to the database being locked.
- The test does not always fail, as is the nature with race conditions. It might take a few tries.
- This test also exhibits a secondary issue, namely that the same row_order-value is taken several times, even if it does not outright fail.
Hi @fschwahn, that error rings a bell. I thought we'd fixed it or at least guarded against it. I assume you don't have a default value on that column as that would be prevent on boot.
Would you be interested in looking at a solution to this one? I suspect it'll involve locking of some kind :)
I've looked a bit into it, and I found the problem:
ranked-model/lib/ranked-model/ranker.rb
Lines 295 to 303 in c8ebb37
Calling reverse
on finder
loads the loads the ActiveRecord::Relation
into memory (ie. finder.loaded?
returns true). Every subsequent call to finder.first
does no DB lookup anymore, but takes the result from the loaded relation in memory. In concurrent settings this loaded relation might be outdated (in my example returns nil instead of a record).
Loading the entire relation into memory seems never a good idea here, as only one record is of interest, so an immediate fix would be to use last
so not the entire relation is loaded:
if (ordered_instance = finder.last)
I know too little about the code to tell if this has other side-effects, but it seems like a safe change.
That's a good idea in any case. However, in case current_first
already returned something, it is memoized in an ivar, and might also be outdated. So it might make sense to call reset_cache
at the top of rearrange_ranks
. However, it is much less obvious to me if this has other side-effects.
That makes a lot of sense :) Even with .last
I suspect there's still a chance for stale data unless we wrap all of this in a locking transaction?
The decision for .reverse
was a bit bizarre:
But it was a simplification of what came before it, though prior to that the code was genuinely trying to reverse the sort order of the query. That code it a lot simpler these days with modern Rails :)
I'll close this for now, but let me know if you have any more thoughts on how to tighten this up.
I suspect there's still a chance for stale data unless we wrap all of this in a locking transaction?
Yes, I think so. However I don't know much about pessimistic locking, and I'd be afraid of introducing deadlocks 😐
I'll close this for now, but let me know if you have any more thoughts on how to tighten this up.
One more idea I had was introducing an optional jitter (applied in rank_at_average
) to reduce the likelihood that 2 items are given the exact same row_order value (which is what ultimately leads to this issue here). However, that might not be desirable in situations where lots of re-ordering happens and the entire available space must be used.
Fair enough :) I think the best way forward would be to look for a guaranteed solution rather than one that reduces the risk.
At least the bug is fixed now :D