ankane / distribute_reads

Scale database reads to replicas in Rails

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handling master's failover

laimison opened this issue · comments

First of all, thanks for this nice gem.

How do you guys solve the situation when master goes down, but the DNS is transparently replaced by automated DB failover.

In my case, I guess that didn't work because Rails doesn't refresh DNS hostname of master or this is blacklisted by Makara/Distributed Reads.

My configuration is

default: &default
  # Parameter to explore if this is allowed here
  # reconnect: true
  url: mysql2-makara:///
  makara:
    # rescue_connection_failures: false
    # master_strategy: failover
    # master_node_selector: 'Makara::NodeSelectors::SingleWithFailover'
    # slave_node_selector: 'Makara::NodeSelectors::RoundRobin'
    sticky: true
    # Parameters to explore
    # slave_strategy: failover
    # master_ttl: 5
    # blacklist_duration: 0
    connections:
      - role: master
        name: primary
        url: 'mysql2://root:password@master-vip:3306/testdb'
      - name: replica
        url: 'mysql2://root:password@replica:3306/testdb'

test:
  <<: *default

development:
  <<: *default

production:
  <<: *default

Am I missing something? It would be brilliant if someone can confirm that you don't have this issue and provide the configuration? Or if you have this issue we can continuously dig into this issue. Many thanks!

Hey @laimison, I can’t recall if we had this issue at Instacart. Let me know what you find.

Discovering Method 1
It seem similar issue happens with AWS RDS instances and Rails setup. I'm not aware if Makara & Distribute Reads can reconnect by checking if DNS has been changed, but to me it looks that they are cached at Mysql2 gem level.

More read:
brianmario/mysql2#948

So @sonots created a patch:
https://github.com/sonots/mysql2-reconnect_with_readonly

It's for mysql only and slave should have set: SET GLOBAL read_only = ON;

Discovering the Method 2
Another dirty method if failover happened is to swap DNS records (for test could be /etc/hosts on Rails host) and restart Rails.

This will increase downtime for some seconds and could be more failed transactions (in any way if master failed, some transaction interruption could happen for a moment depending on architecture).

Thanks for the update. From what I can tell from the patch and discussion, the database server switches roles without disconnecting clients. The patch linked above detects when this happens (based on an error of trying to write to read-only instance) and reconnects, which should use the latest DNS (unless it's cached somewhere else in your stack). That seems like a reasonable approach to me, so I don't think there's anything else Makara or Distribute Reads needs to do.

Thanks, @ankane . Based on your answer, I invested some time to test this patch on top of Makara and Distribute Reads and initial test passed!

I'm not getting any errors and master is always available for writes.

This is on bare metal, but I think the same approach can be useful for any solution including AWS RDS, etc.

The only one minor downside is that Rails still caching mysql-replica DNS name when failover happened (we sorted out mysql-master DNS issue using @sonots patch, but not mysql-replica).

So the tip is to do switchover (restore master and replica to original DBs), then both DBs are on load (master for writes, replica for reads) as desired.

I have in my Gemfile:

gem 'rails', '~> 5.1.6', '>= 5.1.6.1'
...
gem 'makara', '~> 0.4.0'
gem 'distribute_reads', '~> 0.2.4'
gem 'mysql2-reconnect_with_readonly'

In database.yml:

default: &default
  reconnect: true
  url: mysql2-makara:///
  makara:
    sticky: true
    blacklist_duration: 0
    connections:
      - role: master
        name: master
        url: 'mysql2://app:example@mysql-master:3306/my-app'
      - role: replica
        name: replica
        url: 'mysql2://app:example@mysql-replica:3306/my-app'

test:
  <<: *default

development:
  <<: *default

production:
  <<: *default

In controller:

require 'mysql2/reconnect_with_readonly'

My MySQL failover/switchover scripts (using mysqlfailover utility) are responsible to manage DNS records and enable/disable read-only for DB:

mysql-master always resolves master DB IP where read-only is OFF
mysql-replica always resolves slave DB IP where read-only is ON

I believe this can be closed as successful one unless someone else has better solutions or wanted to do different tests. From point of view it just worked well.

I forgot to mention that I have:

gem 'mysql2', '>= 0.3.18', '< 0.6.0'

in my Gemfile. It's mysql2 v0.5.2.

My controller examples_controller.rb to test GET, POST and DELETE:

require 'mysql2/reconnect_with_readonly'

module Api::V1
  class ExamplesController < ApplicationController
    # GET http://IP:PORT/v1/examples
    def index
      distribute_reads(max_lag: 3, lag_failover: true) do
        everything = Example.all
        render json: everything
      end
    end

    # POST http://IP:PORT/v1/examples
    def create
      if params['paramname'] and params['paramname'] =~ /[a-zA-Z0-9]/
        paramname = params['paramname']
      else
        paramname = 'nothing'
      end

      object = Example.new(:name => paramname); object.save

      render json: params
    end

    # DELETE http://IP:PORT/v1/examples
    def destroy
      Example.first.delete if Example.any?

      head 204
    end
  end
end

Glad it's working.

fwiw, I don't think Rails or the mysql2 gem is caching the DNS. Once a connection is established, DNS is no longer in the equation. If you've established a connection to mysql-replica with IP 1.2.3.4 and later mysql-replica points to 5.6.7.8, it doesn't change the fact that the original connection is to 1.2.3.4.

@ankane thanks for strengthening position to try mysql2-reconnect_with_readonly gem. I have checked lsof -i -P on my Rails container and I think it works exactly as you are saying when doing failover and switchover:
DNS is not cached at all, but old connections are attached to old IP by default.

The trick is to use mysql2-reconnect_with_readonly to reconnect sessions - for new mysql-master

When original master becomes available - trick is to do switchover - this reconnects mysql-replica sessions.

So it's just by doing read-only ON/OFF, we can route connections to another instance when needed.

commented

Any workaround for postgres?

@rezart You can probably use a similar approach as the mysql2-reconnect_with_readonly gem.