DistributeReads.by_default not working on all reads by default

Question

DistributeReads.by_default not working on all reads by default

kevinjalbert opened this issue 6 years ago · comments

System

# Gemfile.lock
    distribute_reads (0.2.1)
    makara (0.3.8)
    rails (4.2.5.1)

$ ruby --version
ruby 2.3.6p384 (2017-12-14 revision 61254) [x86_64-linux-gnu]

For these examples, I'm within a rails console that has a master and read replicas using the following configuration.

# config/database.yml

  makara:
    blacklist_duration: 300
    sticky: true
    connections:
      - role: master
        host: xxxx.rds.amazonaws.com
        disable_blacklist: true
      - role: slave
        host: xxxx.rds.amazonaws.com
        name: slave1
      - role: slave
        host: xxxx.rds.amazonaws.com
        name: slave2

# config/initializers/distribute_reads.rb

DistributeReads.default_options = {
  max_lag: 0,
  lag_failover: true,
  failover: true
}

DistributeReads.by_default = true

Makara::Cache.store = DistributeReads::CacheStore.new

# config/initializers/makara.rb

Makara::Logging::Logger.logger = Rails.logger

# Makara middleware is useless in an API context since it
# relies on cookies to track the state. So let's remove it.
Rails.configuration.middleware.delete Makara::Middleware

Example

irb(main):001:0> DistributeReads.by_default
=> true

irb(main):002:0> league = League.last
Cache read: makara::1fef64b0e88e9725450f2dd04763c377-2d4f5ada6edab4e2c16f9182136af9e4
  [slave1] League Load (1.6ms)  SELECT  "leagues".* FROM "leagues"  ORDER BY "leagues"."id" DESC LIMIT 1

irb(main):003:0> league.touch
Cache write: makara::685b9a7cf09375a4d3abac25ec9a4ee0-2d4f5ada6edab4e2c16f9182136af9e4 ({:expires_in=>5})
  [master/1] (1.8ms)  BEGIN
  [master/1] SQL (1.5ms)  UPDATE "leagues" SET "updated_at" = '2018-02-02 15:39:06.243500' WHERE "leagues"."id" = $1  [["id", "fd0a5071-818d-4a0a-9ac3-a940143f24e6"]]
  [master/1] (1.8ms)  COMMIT

irb(main):004:0> league.reload
  [master/1] League Load (1.2ms)  SELECT  "leagues".* FROM "leagues" WHERE "leagues"."id" = $1 LIMIT 1  [["id", "fd0a5071-818d-4a0a-9ac3-a940143f24e6"]]

Issue

In this example, we have DistributeReads.by_default set to true -- which I expected should "distribute reads by default". As we can see in the above example when we first fetch the last league object we pull from slave1. We then write using touch, which flips to master. The last read done via reload continues to go to master when I expected it to go to slave1.

I understand that this is the default behaviour if we have the sticky: true set in makara. Is it wrong to assume that DistributeReads.by_default can override that behaviour?

We saw that if we were to explicitly use the distribute_read global method then we'd get the results we expected.

irb(main):001:0> DistributeReads.by_default
=> true

irb(main):002:0> league = distribute_reads { League.last }
[distribute_reads] Multiple replicas available, lag only reported for one
Cache write: makara::86f8dddaaefe76a601eb8cb82c103aa4-2d4f5ada6edab4e2c16f9182136af9e4 ({:expires_in=>5})
  [master/1] (1.3ms)  SHOW server_version_num
  [slave1] (1.4ms)  SELECT CASE WHEN NOT pg_is_in_recovery() OR pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM NOW() - pg_last_xact_replay_timestamp()) END AS lag
  [slave1] League Load (1.4ms)  SELECT  "leagues".* FROM "leagues"  ORDER BY "leagues"."id" DESC LIMIT 1

irb(main):003:0> distribute_reads { league.touch }
[distribute_reads] Multiple replicas available, lag only reported for one
  [slave1] (1.2ms)  SELECT CASE WHEN NOT pg_is_in_recovery() OR pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM NOW() - pg_last_xact_replay_timestamp()) END AS lag
  [master/1] (0.8ms)  BEGIN
  [master/1] SQL (1.6ms)  UPDATE "leagues" SET "updated_at" = '2018-02-02 15:53:04.351845' WHERE "leagues"."id" = $1  [["id", "fd0a5071-818d-4a0a-9ac3-a940143f24e6"]]
  [master/1] (1.8ms)  COMMIT

irb(main):004:0> distribute_reads { league.reload }
[distribute_reads] Multiple replicas available, lag only reported for one
  [slave1] (1.1ms)  SELECT CASE WHEN NOT pg_is_in_recovery() OR pg_last_xlog_receive_location() = pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM NOW() - pg_last_xact_replay_timestamp()) END AS lag
  [slave1] League Load (0.9ms)  SELECT  "leagues".* FROM "leagues" WHERE "leagues"."id" = $1 LIMIT 1  [["id", "fd0a5071-818d-4a0a-9ac3-a940143f24e6"]]

In addition, this also is doing the appropriate replication_lag checks before each read (and as well as the write).

TL;DR

Does DistributeReads.by_default work in conjunction with Makara's sticky: true automatically? Do we need to have anything to ensure that the reads are actually going through distribute_reads? We are looking for a seamless automatic approach for pushing our reads through DistributeReads.

Andrew Kane · Answer 1 · Sat Feb 03 2018 01:21:33 GMT+0800 (China Standard Time)

Hey @kevinjalbert, DistributeReads.by_default respects sticky: true, which is why a write is causing future reads to be routed to the primary for a period of time. It sounds like you want to use sticky: false.

Kevin Jalbert · Answer 2 · Sat Feb 03 2018 02:05:01 GMT+0800 (China Standard Time)

@ankane okay nice to know that. I'm wondering if using DistributeReads with a sticky: false along with DistributeReads.by_default would also do the replication_lag checks before each read?

I'm trying to think of some simple setup where replication lag and splitting reads over the replicas doesn't require much thought.

Andrew Kane · Answer 3 · Wed Feb 07 2018 11:40:15 GMT+0800 (China Standard Time)

@kevinjalbert Distribute Reads doesn't provide a way to do this, but you can probably use Ruby's prepend on Makara's _appropriate_pool method as this gem does to do it. https://github.com/ankane/distribute_reads/blob/master/lib/distribute_reads/appropriate_pool.rb

Kevin Jalbert · Answer 4 · Tue Feb 20 2018 09:12:45 GMT+0800 (China Standard Time)

@ankane thanks for the information. I forgot to close this issue out.

I'm currently experimenting with a custom prepended adaptor that'll hopefully, lead to a more automatic switching based on replication lag while being less aggressive in sticking to master.