Dual Data Center Config
seanfulton opened this issue · comments
I thought I had seen a discussion of this here previously but can't find it. We have two data centers with a separate DKRON cluster in each. We had the EXTREMELY unlikely event of one of our data centers going down for 51 hours, and jobs that were in our NJ DC had to be manually run in Chcago.
Our chicago jobs were fine, and most of the NJ jobs could have run from Chicago had they been configured to do so. But because we had two separate clusters, this was not possible.
When I set this up I thought of using a single cluster with tags for each DC, but we didn't go that route because of quorum; if we have 5 servers in each DC, and a DC goes down, there's no quorum.
I thought I had seen this discussed here before. Is there a documented way to address this? I was thinking of having a spare server node in each DC that would be added to the cluster if the other DC is unreachable, but it would also need to be taken down when the DC returns. Also, if I tag jobs for NJ and CHI, how would I get the NJ jobs to run in CHI if NJ goes down?
Thanks in advance!
sean
When I set this up I thought of using a single cluster with tags for each DC, but we didn't go that route because of quorum; if we have 5 servers in each DC, and a DC goes down, there's no quorum.
Why not use only 1 cluster in 1 DC? Never use cross-datacenter cluster.
I thought I had seen this discussed here before. Is there a documented way to address this? I was thinking of having a spare server node in each DC that would be added to the cluster if the other DC is unreachable, but it would also need to be taken down when the DC returns. Also, if I tag jobs for NJ and CHI, how would I get the NJ jobs to run in CHI if NJ goes down?
No, there's no document on how to approach this, it should be considered case by case.
What I would do in your case is to have a single 3 nodes cluster in Chicago and a good backup system in place, so in case of the extreme unlikely failure of one DC you can spin up the cluster in another DC and load the backup in a matter of minutes.