tiangolo / dockerswarm.rocks

Docker Swarm mode rocks! Ideas, tools and recipes. Get a production-ready, distributed, HTTPS served, cluster in minutes, not weeks.

Home Page:https://dockerswarm.rocks/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Traefik redundancy and DNS configuration

NReilingh opened this issue · comments

Hi @tiangolo -- this is a great guide and I'm enthusiastic about Docker Swarm as a better fit for places where Kubernetes is overkill. One thing that the guide doesn't go into tremendous detail on is DNS, and I have been confused about the specifics of redundancy, considering DNS does not actually provide for redundancy on its own as far as I can tell.

My understanding is this: At the end of the day, your DNS needs to point to the swarm nodes that Traefik is deployed on. If you have multiple Traefik nodes, you can round robin them in DNS to distribute load, but if one node fails, nothing is stopping DNS from continuing to resolve to that node, in proportion to the other IPs that are configured in the round-robin scheme. Thus, the redundancy of service distribution across the swarm doesn't translate to service availability for a client.

One thing that could help here is that it isn't strictly necessary to run Traefik on a manager node -- Traefik can access the Docker API from another host over TCP or SSH if made available.

In light of this, one possible way to increase reliability would be to factor out Traefik to not run in the Docker Swarm at all, and instead have a pair of separate hosts running Traefik in round robin, proxying traffic back to the swarm. This still doesn't protect you from one of those Traefik hosts failing, but since those hosts have only one job we would consider them to be extremely stable. So in effect, the strategy is to compensate for DNS's lack of redundancy with stability, by trading off on the flexibility and automation of running Traefik inside the swarm.

Curious to know your thoughts on this, or if you think I'm missing anything that I should consider. And thanks again for writing up the guide! It's a great resource.

This might not fully address your concerns but something to consider when making use of separate nodes to access Traefik ingress would be to setup Keepalived for a Virtual IP targeting your swarm (wherever you expect Traefik to run)

Your DNS record should point to this IP; health checks via Keepalived handle HA and Traefik will Load Balance; hopefully this makes sense and doesn't further complicate your scenario.

Hello! Thanks for the post! I should let you know, that I had to deprecate this website and ideas, I would no longer recommend Docker Swarm Mode for new projects: https://dockerswarm.rocks/swarm-or-kubernetes/ 🥲

Assuming the original issue was solved, it will be automatically closed now. But feel free to add more comments or create new issues.