Context stickiness during job
erickrom opened this issue · comments
Hey @ankane
First of all thanks for this awesome gem, we are thinking of using it in our project. I was wondering if is possible to respect makara sticky context inside a job?
Here's an example:
class TestJob < ApplicationJob
distribute_reads
def perform
user = User.find_by(name: 'Fred') #replica
user.update_attributes!(name: 'Joe') #primary
...
...
user = User.find_by(name: 'Joe') #replica, may not find Joe since it has not been replicated!
...
end
end
Would it be possible to switch context during a job once a write has been made so it makes subsequent reads to primary?
Hey @erickrom, to get Makara stickiness behavior, you'll want to use:
DistributeReads.by_default = true
And not add distribute_reads
to the job.
Hi @ankane, thanks for the quick reply!
That makes sense, although it didn't seem clear to me that's how the .by_default
method was supposed to be used with jobs.
A couple of follow up questions:
What is the recommended way to gradually migrate jobs to use distribute_reads? Would it be to simply add DistributeReads.by_default = true
at the top of each perform
method definition?
Also, is there any plan to support other types of jobs other than ActiveJob ? The application I am currently working on has legacy code with many types of job classes such as Resque, Sidekiq, Shoryuken. I had to use Resque hooks to get it to work with one of the jobs I am working to migrate to use distribute_reads so that only that one job uses the replica (eventually other jobs would migrate but I want to test with one non-critical job first).
I could post my solution if you think it would be of interest to others, I have not figured out yet what would be the best way to incorporate it to the gem but if I think of one I could submit a pull request.
by_default
is a global setting you can add to an initializer (but it'll affect all jobs). There are no plans to support non-ActiveJob systems, but you can use an around hook in a job system to implement this yourself.
Yes that is the solution I went with, using around_perform hook. We already use Resque hooks heavily in my application so creating a short module sounded like the easiest way to add it. In case anyone finds it useful here's a small code snippet that I used for Resque Jobs:
# distribute_reads_job.rb
module DistributeReadsJob
def around_perform_with_distribute_reads(*args)
distribute_reads(distribute_reads_options) { yield(*args) }
end
end
Then in the job class extend the module to include the around_perform hook:
class MyJob
extend DistributeReadsJob
def perform
# do work here that makes lots of database reads
...
# each job can declare it's options
def distribute_reads_options
{
max_lag: 3,
lag_failover: true
}
end
For jobs that we want to use the default behavior, we created another module that sets .by_default
to true in a before_perform
hook and sets it to false in the after_perform
hook. I believe this is thread safe since Resque creates child processes to run the jobs, I tested this by running multiple parallel jobs and ensuring that they all read from the right place:
# distribute_reads_job_with_sticky_context.rb
module DistributeReadsJobWithStickyContext
def before_perform_with_distribute_reads(*args)
DistributeReads.by_default = true
end
def after_perform_with_distribute_reads(*args)
DistributeReads.by_default = false # not really required since once job completes process is killed but for completeness
end
end