OptimalBits / bull

Premium Queue package for handling distributed jobs and messages in NodeJS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Read Only Error when Same Queue is used in two separate services connected with elasticache(Redis) cluster

divysts opened this issue · comments

"When using Bull Queue with an AWS Redis cluster (cluster mode enabled) and configuration endpoint with no primary endpoint, there is no permanent primary endpoint. When this configuration is used to create a Bull Queue, and this Queue with the same topic and prefix is created and processed in two different services connected over an AWS ElastiCache cluster, it causes READONLY You can't write against a read only ? after adding and processing some requests

const Queue = require('bull');
const Redis = require('ioredis');
const options = {
       dnsLookup: (address, callback) => callback(null, address),
       redisOptions: {
           tls: tls,
           password: 'password',
           failoverDetector: true
       },
       scaleReads: "all",
       retryStrategy: (times) => {
           if (times <= 10) {
               return Math.min(times * 100, 2000);
           } else {
               return null;
           }
       }
   }

  var cluster = new Redis.Cluster(nodes, options);
  var myQueue = new Queue({myQueue} {
       prefix: `{queue}`,
       createClient: function (type) {
           console.log(type);
           switch (type) {
               case "client": return cluster.duplicate();
               case "subscriber": return cluster.duplicate();
               case "bclient": return cluster.duplicate();
               default: return new Redis(); // Fallback to the default client
           }
       },
       settings: { lockDuration: 600000 }
   });

Use aws cache with clustermode enabled with a cluster configuration Endpoint and try to declare these queues in two different services with add and process after some time you wiil be able to observe Error Read only You can't write against a Read Only Replica

Bull version "bull": "^4.11.4",

Additional information "ioredis": "^5.3.2",

RESPONSE FROM AWS

Elasticache triggers a failover to switch to new primary and updates the DNS record to point to new primary. However, if the client has cached the old nodes IP and connects to the old primary, you will face the error "READONLY You can't write against a read only replica" as the old primary is no longer accepting writes.

Usual tendency of client application is to resolve the primary endpoint once and cache the primary IP address locally. During the failover the IP address on primary endpoint changes to new primary node. If your client does not detect this change, it will still try to connect to old IP and this will result in prolonged downtime.

To ensure less interruption, you should have proper retry configuration in place at client side to detect failover and pick new IPs. Also, you can consider to reduce the DNS caching TTL values (if used).

The response from AWS seems to imply that the issue is on ioredis, if so, there is not a lot we can do from our side.