How do I handle connection to Redis errors ?

Question

How do I handle connection to Redis errors ?

louisameline opened this issue 9 years ago · comments

Hello,

my application crashes when Redis is/becomes unavailable and I'm not sure how to handle this. Should I be catching an exception somewhere ?

Thank you.

Sean McDaniel · Answer 1 · Sat Jul 25 2015 08:44:23 GMT+0800 (China Standard Time)

Wait what? Redis can become unavailable?

Been away from Node for a while but I see that there is retry logic built into node_redis. Off the top of my head I don't know of a graceful way to handle this. I'll play around with this and see if I can find something to help you out.

Sean McDaniel · Answer 2 · Sat Jul 25 2015 10:01:45 GMT+0800 (China Standard Time)

Found a solution fairly quickly. First resque needs to listen for the error event which is required to ensure the Node process doesn't exit. Then we just need to check the redis.connected property before performing a poll cycle.

  poll: (title, nQueue = 0) ->
    if not @redis.connected
      process.nextTick => @pause()
      return

    return unless @running
    process.title = title if title
    @queue = @queues[nQueue]
    @emit 'poll', @, @queue
    @redis.lpop @conn.key('queue', @queue), (err, resp) =>
      if !err && resp
        @perform JSON.parse(resp.toString())
      else
        @emit 'error', err, @, @queue if err
        if nQueue == @queues.length - 1
          process.nextTick => @pause()
        else
          process.nextTick => @poll title, nQueue+1

I'll structure the code a bit but this approach does the trick and gracefully recovers from connection instability. Now it may not be totally foolproof. I noticed that if I attempt to execute a command when in a disconnected state the error handler isn't called and the process exits. This could happen if the connection is lost between the time it is checked and the lpop command is executed, a little edgy but worth mentioning.

Oh, one other thing, if redis isn't available on the initial connection the process exits. I'll see if there is something we can do there but part of me thinks exiting in the situation is probably desirable.

Louis Ameline · Answer 3 · Sat Jul 25 2015 16:28:32 GMT+0800 (China Standard Time)

Thank you for the quick response !

I'm trying to eliminate single points of failure in my app, and I'm deliberatly stopping Redis to see how everything else performs. Since resque queries have a callback parameter that can handle a failure, I was assuming that it would be enough to handle errors at this level and, for example, store the resque jobs on the filesystem until they can be queued in Redis again.

In my use case, my Node app is busy all the time fetching RSS feeds from the internet, and if Resque is not available to queue the parsing and image-processing jobs, I still want my app to run and fetch the content for later analysis. I may be conceptually wrong but I don't consider Redis like a part of my Node crawler app, it's more like an external module that makes the link with the rest of my stack. So I'd expect my app to run even if Redis were not available at start.

Evan Tahler · Answer 4 · Sat Jul 25 2015 16:47:40 GMT+0800 (China Standard Time)

Keep in mind you can use things like redis sentinels to create a redis "cluster" with HA and failover as well

Sean McDaniel · Answer 5 · Sat Jul 25 2015 21:24:40 GMT+0800 (China Standard Time)

Good suggestion @evantahler. For projects that need a more robust infrastructure that is route that I would go with. Maybe I should look at ioredis for users who need this type of capability.

Louis Ameline · Answer 6 · Sun Jul 26 2015 20:20:38 GMT+0800 (China Standard Time)

Thank you for the hint, I'll definitely try sentinels at some point (if supported). While it's always nice to reduce the risk of failure, I still like the idea that my app will deal with it in case it were to happen. But I'll understand if you say that it would not be worth the required amount of work.

Ruben Bridgewater · Answer 7 · Fri Oct 02 2015 22:02:50 GMT+0800 (China Standard Time)

This should be solved if you use node-redis >= v.2