Betterment / delayed

a multi-threaded, SQL-driven ActiveJob backend used at Betterment to process millions of background jobs per day

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Running against a secondary database

fractaledmind opened this issue · comments

I am opening this issue primarily to get the thoughts and feedbacks from others on whether or not there are glaring problems with this idea that I'm not seeing. So, to the idea:

What if the job queue were run against a dedicated database, separate from the primary app database? Rails now supports multiple databases, so the infrastructure is in place. I began an experimental branch in a project of mine (leading to PR #12) where I am using a SQLite database to manage my job queue independently of my primary PostgreSQL database. To my initial thinking, this gives the benefits of a Redis-backed queue of not blocking or throttling application database work, while also giving the benefits of having a SQL-backed queue.

However, one of the the primary benefits of a SQL-backed jobs queue is co-transactionality. In the README I note:

Important: the above assumes that the connection used by the transaction is the one provided by ActiveRecord::Base. (Support for enqueuing jobs via other database connections is possible, but is not yet exposed as a configuration.)

So, this gets me thinking, what all do you know that I have yet to learn? And, how can I help, if possible, to bring co-transactional, but separate DB jobs into the world?

Hi @fractaledmind!

Good eye! I put that line in our README to mention only that it's possible to enqueue jobs to databases other than the primary. But in cases where we enqueue to secondary databases, we still assume that the transactionality guarantees only apply to that database. (So if we decide to enqueue a job in Database Two, it's because we opened a transaction against Database Two and are already inserting/updating other business data in Database Two.)

Bringing co-transactionality to multiple DB connections at once would require some form of transaction manager and a two-phase commit. (I'd point out PostgreSQL's PREPARE TRANSACTION and its many caveats. I'm not aware of any equivalent feature in SQLite.)

So, as it stands, we don't have any out-of-the-box solution to share, sadly. (Our general strategy, speaking at least for our Rails ecosystem, is to entirely avoid the need for any cross-database transactionality.)

@fractaledmind we recently deployed Delayed Job to a secondary database on a big monolith. I came across this issue while investigating delayed (from here), but figured that experience could still be worth sharing even if it's with a slightly different gem.

The implementation ended up being quite simple. A new DB in database.yml, and then this initializer:

# this is a bit convoluted, it's needed to make DJ talk to a different database
#
# Delayed::Backend::ActiveRecord::Job is the class from https://github.com/collectiveidea/delayed_job_active_record/blob/master/lib/delayed/backend/active_record.rb
# DelayedJobAbstractParentClass is needed as `connects_to` can only be called on abstract classes
# DelayedJobImplementation is needed as DJ needs a non-abstract class to do stuff with
# Delayed::Worker.backend is where `Delayed::Job` is set: https://github.com/collectiveidea/delayed_job/blob/baed6e813870e1144e7a4291bc71e06a67a533de/lib/delayed/worker.rb#L64

class DelayedJobAbstractParentClass < Delayed::Backend::ActiveRecord::Job
  self.abstract_class = true
  connects_to(database: TandaRuntime.jobsdb_database_settings)
end

class DelayedJobImplementation < DelayedJobAbstractParentClass; end

Delayed::Worker.backend = DelayedJobImplementation

In terms of co-transactionality, you're right, you lose that benefit by doing this. We've been using https://github.com/palkan/isolator for a long time (even prior to this change) so this wasn't an issue for us. But it's certainly something to be aware of.

Anyway hope this helps. Happy to share more about what worked.

Hey Alex, am I correct that using isolator with something other than they call a safe ActiveJob backend, it'll tell you when you may be mistakenly trying to use transactions to ensure consistency across systems, but it doesn't actually present a solution to assisting with ensuring eventual consistency, right? One of the reasons we made the call to stick with safe backends was that it keeps you from having to take on as many distributed system consistency challenges at your business domain layer, beyond writing idempotent job payloads. How does your team solve those?

We are pretty strict about only queuing jobs in after_commit. And we use isolator to help remind us of that. We don't use Active Job (I'd use it on a new project but this is a big old codebase that existed long before it).

Got it, thanks! So your background workloads are to some extent best-effort or you implement reconciliation processes when jobs don’t successfully enqueue post-commit?

when jobs don’t successfully enqueue post-commit

We don't do anything special for this. Maybe we should 🤔 Beyond using delayed is there anything you do?

Yeah, delayed is our first line of defense, creating a pit of success for engineers so they don't have to reason about the possibility that their job won't run. We also do reconciliation checks on the back for critical system consistency invariants, in case of bugs in the jobs themselves, or elsewhere in the downstream code.

I don't mind supporting a secondary DB given that it seems to be a small change to expose this functionality.

However, my main reason for not using sidekiq or other redis et all based systems is to keep things simple and have strong transaction guarantees. And I'd caution against complicating this.

Agree - we're not likely to introduce features that allow users to relax core guarantees of the library in ways that they might not expect. For instance, if you're writing to multiple database connections within your app, it makes sense to enqueue jobs to each respective database consistently, and you still get at least per-database consistency guarantees, but probably not enable jobs to write to one database and app data to another.

Makes sense. Will close. Thanks for the good conversation.