Allow multiple A records for the same domain from different external-dns instances.

Question

Allow multiple A records for the same domain from different external-dns instances.

sturdyhippo opened this issue 4 years ago · comments

What would you like to be added:
Allowing multiple A records to be set from different sources that are managed by different external-dns instances.

For some background, I'm trying to create A records from services of type LoadBalancer in different clusters, but it seems that currently (v0.6.0) the only way to specify multiple IP addresses for a single DNS name is to include them all as targets in a single DNSEndpoint, which is not an option when services are running in different clusters using different instances of external-dns. When I attempt to do this, only one of the records is created and then the logs report level=info msg="All records are already up to date" across all instances.

Why is this needed:
Allowing multiple A records per domain allows for failover clusters with minimum configuration, and is especially useful in situations where inter-region load balancers aren't available, like with DigitalOcean or on-prem. The IP addresses for load balancers or ingresses are only available in their respective cluster, and cannot all be consolidated into a single DNSEndpoint resource without implementing custom automation that would require resource inspection permissions across clusters.

David Talbot · Answer 1 · Fri Mar 06 2020 09:26:38 GMT+0800 (China Standard Time)

I have the need for the same use case. but currently cannot find a workaround to get this behaviour to work.

If putting external-dns into debug then we get the message from the second external-dns instance that it cannot add the A record as it is not the owner.

If something like the TXT records value was keyed by the external-dns's txt-owner-id then it would be able to maintain and store the records associated with that cluster so that multiple external-dns instances from multiple clusters could all maintain records for x.foo.bar

Jonathan Pulsifer · Answer 2 · Sat May 16 2020 07:27:23 GMT+0800 (China Standard Time)

I would also like to play with this at work, and at home.

This issue has been around for a while, @njuettner @Raffo do you have any thoughts?

fejta-bot · Answer 3 · Fri Aug 14 2020 08:25:55 GMT+0800 (China Standard Time)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Sean Malloy · Answer 4 · Fri Aug 14 2020 11:54:51 GMT+0800 (China Standard Time)

/remove-lifecycle stale

Savitha Ganapathi · Answer 5 · Thu Sep 24 2020 07:48:47 GMT+0800 (China Standard Time)

Is this being considered? It is very useful to have in multi-region deployments (i.e. multiple Kubernetes clusters) when using service discovery protocols such as JGroups DNS_PING. Would appreciate adding this feature! :)

Kyle Espinola · Answer 6 · Fri Nov 13 2020 03:22:40 GMT+0800 (China Standard Time)

I am also interested in this feature to help safely rollover traffic to a new cluster.

I would like external dns to run in both the current and incoming cluster and attach to their respective gateways (we are using Istio). The incoming cluster and current cluster should contribute to the same record in Route 53 but assign independent weights. For example, start with responding to 10% of DNS queries with the IP of the incoming Istio Ingress load balancer and the rest to the current load balancer. This requires the DNS provider to support weighted entries which Route 53 does but I'm not sure about others.

I am happy to help make this contribution if it is desired by the maintainers. I'd also love to hear other methods for achieving the same incremental rollout of services from one cluster to another.

Ricardo Saffi Marques · Answer 7 · Fri Dec 04 2020 22:36:14 GMT+0800 (China Standard Time)

One more use-case for this right here! ✋
And same reason: safe rollout of the service on different clusters (different providers, even), so multiple external-dns instances.

Paul Miu · Answer 8 · Fri Dec 11 2020 07:55:48 GMT+0800 (China Standard Time)

We're looking for the same (multiple A records per domain) but for another purpose. We'd like to use Round-robin DNS in our cluster. Port 80 and 443 of every node is exposed to the public and can be used as an entry for all routes (handled by ingress-nginx, as described here).

Or is this already a feature that can be enabled via configuration?

Povilas Susinskas · Answer 9 · Wed Feb 10 2021 01:12:10 GMT+0800 (China Standard Time)

Same here: we have many short-lived clusters and external-dns seems would be a good fit to automate these DNS records for our API gateways

CRASH-Tech · Answer 10 · Wed Mar 03 2021 19:31:06 GMT+0800 (China Standard Time)

Also need this feature

Theodore Salvo · Answer 11 · Sun Apr 25 2021 03:40:36 GMT+0800 (China Standard Time)

I'm looking at contributing to this issue (since I'm also interested in it), but wanted to discuss the experience before working on it.

Would this need a feature flag or argument to enable? (e.g. wouldn't be a default)
Would we want some sort of permission model for determining which external-dns instance can share a service/record?

I'm specifically focusing around the aws-sd provider (but will also test for the txt provider). When I created a new service in cluster-0 called nginx, the Cloud Map Service uses this for the Description field:

heritage=external-dns,external-dns/owner=cluster-0,external-dns/resource=service/default/nginx

Would it make sense to have an annotation on the k8s Service resource specifying it as a "shared" resource? That way, if both k8s clusters agree that the resource is shared, they will use a different behavior model and not overwrite each other's records (Cloud Map Service Instances).

For each record (Service Instance), I was thinking of adding Custom attributes for heritage, owner, and resource, and each external-dns instance would be responsible for updating the records if it's the owner.

There's a few operational checks that would need to exist around the Cloud Map Service resource (e.g. not deleting the service if other external-dns instances have records in there).

Any thoughts/opinions?

Pavel Tumik · Answer 12 · Mon Apr 26 2021 22:04:29 GMT+0800 (China Standard Time)

@buzzsurfr would be really cool if you can implement this feature!
Some thoughts:

I think this feature should be behind argument, since it does change behavior of external-dns quite a lot. Plus that would allow us to test this feature a bit more safely.
I think we should indicate to external-dns that some record should be marked as 'shared' (probably via additional annotation?). That way if existing record already exists, new record from a different cluster will not attempt to hi-jack it and start adding its own records to it. So all external-dns instances from various clusters should all be set to treat that record as shared.
As for permissions, I think above would fix it? For example, if there is already non shared record, and new instance tries to add a shared record -> it will spit out error instead. If record was already shared, and new instance has a record set to non shared, it will spit out error.
I am not sure how you would resolve the issue of external-dns/owner and external-dns/resource records in TXT record, since AFAIR it is used to validate inside external-dns which resource owns that record.I guess if both of them are set to shared then those validations will just be skipped?

Looking forward to checking out merge request, as I am curious how it will be implemented.

Kubernetes Triage Robot · Answer 13 · Tue Jul 27 2021 09:18:12 GMT+0800 (China Standard Time)

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

Paul Miu · Answer 14 · Tue Jul 27 2021 10:15:14 GMT+0800 (China Standard Time)

/remove-lifecycle stale

CRASH-Tech · Answer 15 · Wed Sep 01 2021 04:36:13 GMT+0800 (China Standard Time)

We can add assigned IP to TXT record, than external-dns will be known which record is own

Kubernetes Triage Robot · Answer 16 · Tue Nov 30 2021 05:25:13 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Peter Rifel · Answer 17 · Tue Nov 30 2021 05:36:24 GMT+0800 (China Standard Time)

/remove-lifecycle stale

ZzY|R0ck · Answer 18 · Fri Feb 11 2022 23:37:22 GMT+0800 (China Standard Time)

I have the same requirement, is this feature under development?

If it is annotate nodeport svc, externaldns can add multiple A records at the same time.
However, if Ingresses with the same domain name are published separately through IngressClass, only the A record in the first Ingress will be updated.

Kubernetes Triage Robot · Answer 19 · Fri May 13 2022 02:05:27 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Jonathan Pulsifer · Answer 20 · Fri May 13 2022 05:00:16 GMT+0800 (China Standard Time)

/remove-lifecycle stale

Evan Van Dam · Answer 21 · Sat May 28 2022 02:20:03 GMT+0800 (China Standard Time)

Just adding it would be great to see a solution for handling things. I have a similar use case where we have a blue/green EKS cluster where both are running ExternalDNS and they would otherwise try to overwrite each other's Route53 records.

Lucas Mellos Carlos · Answer 22 · Tue Jun 28 2022 03:59:30 GMT+0800 (China Standard Time)

I'm having the same issue as well. It's important for multi-cluster architectures

Estefano Ramirez · Answer 23 · Fri Sep 02 2022 09:11:50 GMT+0800 (China Standard Time)

Hi everyone,

The workaround I applied on multiple clusters for the same domain was to pass the arg --txt-owner-id:

For cluster A:

        args:
        --source=service
        --source=address
        --domain-filter=midomain.ai # (optional) limit to only example.com domains; change to match the zone created above.
        --provider=cloudflare
        --cloudflare-proxied
        --log-level=debug
        --txt-owner-id=cluster-a ---> here change owner id

For cluster B:

        args:
        --source=service
        --source=input
        --domain-filter=midomain.ai # (optional) limit to example.com domains only; change to match the zone created above.
        --provider=cloudflare
        --cloudflare-proxied
        --log-level=debug
        --txt-owner-id=cluster-b ---> here change owner id

With this I was able to install the external-dns for my midomain.ai domain in multiple clusters since the --txt-owner-id is different for each cluster and with that it does not give the error.

For example, I use it for Cloudflare and the txt entries that it creates for each cluster looks like this:

Cluster A:

"heritage=external-dns,external-dns/owner=cluster-a,external-dns/resource=ingress/argocd/ingress-argocd"

Cluster B:

"heritage=external-dns,external-dns/owner=cluster-b,external-dns/resource=ingress/staging/stg-staging"

I hope this one will be helpful.

Kubernetes Triage Robot · Answer 24 · Thu Dec 01 2022 09:24:33 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Tomáš Sýkora · Answer 25 · Thu Dec 01 2022 13:55:54 GMT+0800 (China Standard Time)

/remove-lifecycle stale

Edward Park · Answer 26 · Thu Dec 15 2022 05:42:46 GMT+0800 (China Standard Time)

is there any known workaround for this? I attempted @Eramirez33's approach, but im still getting this conflict when both external-dns instances have unique owner-id:

Skipping endpoint...because owner id does not match...

Gabriel · Answer 27 · Fri Jan 06 2023 04:38:40 GMT+0800 (China Standard Time)

@Eramirez33 You save my life bro!!!

Jasper · Answer 28 · Mon Mar 27 2023 09:19:27 GMT+0800 (China Standard Time)

--txt-owner-id as suggested by @Eramirez33 works for the case of having different external-DNS instances managing different DNS names in the same zone. It doesn't work for having different external-DNS instances managing multiple A records for the same DNS name in the same zone, which was the original subject of this issue.

Kubernetes Triage Robot · Answer 29 · Sun Jun 25 2023 09:56:58 GMT+0800 (China Standard Time)

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale