ankane / distribute_reads

Scale database reads to replicas in Rails

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Context stickiness during job

erickrom opened this issue · comments

commented

Hey @ankane

First of all thanks for this awesome gem, we are thinking of using it in our project. I was wondering if is possible to respect makara sticky context inside a job?

Here's an example:

class TestJob < ApplicationJob
  distribute_reads

  def perform
    user = User.find_by(name: 'Fred') #replica
    user.update_attributes!(name: 'Joe') #primary
    ...
    ...
    user = User.find_by(name: 'Joe') #replica, may not find Joe since it has not been replicated!
    ...
  end
end

Would it be possible to switch context during a job once a write has been made so it makes subsequent reads to primary?

Hey @erickrom, to get Makara stickiness behavior, you'll want to use:

DistributeReads.by_default = true

And not add distribute_reads to the job.

commented

Hi @ankane, thanks for the quick reply!

That makes sense, although it didn't seem clear to me that's how the .by_default method was supposed to be used with jobs.

A couple of follow up questions:

What is the recommended way to gradually migrate jobs to use distribute_reads? Would it be to simply add DistributeReads.by_default = true at the top of each perform method definition?

Also, is there any plan to support other types of jobs other than ActiveJob ? The application I am currently working on has legacy code with many types of job classes such as Resque, Sidekiq, Shoryuken. I had to use Resque hooks to get it to work with one of the jobs I am working to migrate to use distribute_reads so that only that one job uses the replica (eventually other jobs would migrate but I want to test with one non-critical job first).

I could post my solution if you think it would be of interest to others, I have not figured out yet what would be the best way to incorporate it to the gem but if I think of one I could submit a pull request.

by_default is a global setting you can add to an initializer (but it'll affect all jobs). There are no plans to support non-ActiveJob systems, but you can use an around hook in a job system to implement this yourself.

commented

Yes that is the solution I went with, using around_perform hook. We already use Resque hooks heavily in my application so creating a short module sounded like the easiest way to add it. In case anyone finds it useful here's a small code snippet that I used for Resque Jobs:

# distribute_reads_job.rb
module DistributeReadsJob
  def around_perform_with_distribute_reads(*args)
    distribute_reads(distribute_reads_options) { yield(*args) }
  end
end

Then in the job class extend the module to include the around_perform hook:

class MyJob
  extend DistributeReadsJob

  def perform
    # do work here that makes lots of database reads
    ...

  # each job can declare it's options
  def distribute_reads_options
    {
      max_lag: 3,
      lag_failover: true
    }
  end

For jobs that we want to use the default behavior, we created another module that sets .by_default to true in a before_perform hook and sets it to false in the after_perform hook. I believe this is thread safe since Resque creates child processes to run the jobs, I tested this by running multiple parallel jobs and ensuring that they all read from the right place:

# distribute_reads_job_with_sticky_context.rb
module DistributeReadsJobWithStickyContext
  def before_perform_with_distribute_reads(*args)
    DistributeReads.by_default = true
  end

  def after_perform_with_distribute_reads(*args)
    DistributeReads.by_default = false # not really required since once job completes process is killed but for completeness
  end
end