iconara / cql-rb

Cassandra CQL 3 binary protocol driver for Ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic reconnect when a seed node connection fails

ajsharp opened this issue · comments

Currently, if the connection is reset on one of the seed nodes (such as from restarting the process) cql-rb raises a Cql::NotConnectedError exception. Are you in favor of adding logic that automatically attempts a reconnect when the connection is broken?

commented

Not sure exactly what you mean. When a connection is closed it will eventually be reconnected, but it will wait for one of the other nodes to send an up event.

That's not the behavior I'm seeing. What I'm seeing is that if a seed node process is restarted (stopped, then started again) any existing connection objects will continue to raise the the Cql::NotConnectedError exception, even after the seed node has begun listening for connections again.

Is this not the intended behavior?

module Connection
  def self.client
    @client ||= begin
      client = Cql::Client.connect(hosts: config['seeds'], default_consistency: :one)
      client.use("test")
      client
    end
  end
end

Connection.client.execute("select * from table")
# Restart the server, wait until accepting connections
Connection.client.execute("select * from table") # Boom!
commented

I'm pretty sure on the reconnection mechanisms, they're tested continuously in our clusters. I've even upgraded clusters while everything was running.

You can inject a logger into the driver to see what it's doing: Client.connect(..., logger: Logger.new($stderr), ...). It will log when it looses connections, when it receives up events and when it reconnects, among other things.

How many nodes are you running? Are you sure that not all of the connections close at the same time? It kind of sounds from your description that you're only running a single node, and in that case it won't reconnect, since there's no one around to send an up event.

I'm actually using a logger initialized in the connection, and it's not showing anything when the connection is dropped. Sorry, I should've included that in the sample code, but everything else is accurate. In development, I'm just using a single node, 127.0.0.1. But I've observed and tested this same behavior in a two-node setup (with one seed).

When I make the initial connection, there are log messages for that, but not when the node is killed, or when it comes back up.

Oh, and I'm using commit 41a7469.

Ah, I just realized this:

How many nodes are you running? Are you sure that not all of the connections close at the same time? It kind of sounds from your description that you're only running a single node, and in that case it won't reconnect, since there's no one around to send an up event.

Do I need to be running at least two seeds for this to work correctly?

commented

If you only have a single node then what you're seeing is expected. The driver will not attempt to reconnect when no connections are open. The reconnection logic relies entirely on events sent by the cluster about when nodes join.

If you have two nodes then you should see reconnections happening if you only restart them one at a time (and make sure that the driver is connected to both before you restart one).

It would be possible to add logic that would attempt to reconnect even when the last connection fails, but I'm not sure it's very useful. It's such a rare case that all nodes are down and you want your application to stay up. If you really need that behaviour you probably have very specific needs and you might as well handle the reconnection yourself. Automatic reconnection is horribly complex so I'm reluctant to add something that only solves a very rare need, and something that would probably need a lot of configuration hooks (how often should the nodes be probed to see if they're up, how many failures should be tolerated per time unit, etc.)

If you could explain a bit more about what your needs are maybe theres something I'm not seeing.

Also, strange about the logging not showing anything. Make sure you're not at a too high log level (just inject a raw logger at debug level, it should log quite a lot of things when it connects).

If you want a more detailed description on how connection and reconnection works I wrote one in response to another issue.

Thanks for your help @iconara. Closing this issue.