kubernetes-sigs / external-dns

Configure external DNS servers (AWS Route53, Google CloudDNS and others) for Kubernetes Ingresses and Services

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow multiple A records for the same domain from different external-dns instances.

sturdyhippo opened this issue · comments

What would you like to be added:
Allowing multiple A records to be set from different sources that are managed by different external-dns instances.

For some background, I'm trying to create A records from services of type LoadBalancer in different clusters, but it seems that currently (v0.6.0) the only way to specify multiple IP addresses for a single DNS name is to include them all as targets in a single DNSEndpoint, which is not an option when services are running in different clusters using different instances of external-dns. When I attempt to do this, only one of the records is created and then the logs report level=info msg="All records are already up to date" across all instances.

Why is this needed:
Allowing multiple A records per domain allows for failover clusters with minimum configuration, and is especially useful in situations where inter-region load balancers aren't available, like with DigitalOcean or on-prem. The IP addresses for load balancers or ingresses are only available in their respective cluster, and cannot all be consolidated into a single DNSEndpoint resource without implementing custom automation that would require resource inspection permissions across clusters.

I have the need for the same use case. but currently cannot find a workaround to get this behaviour to work.

If putting external-dns into debug then we get the message from the second external-dns instance that it cannot add the A record as it is not the owner.

If something like the TXT records value was keyed by the external-dns's txt-owner-id then it would be able to maintain and store the records associated with that cluster so that multiple external-dns instances from multiple clusters could all maintain records for x.foo.bar

I would also like to play with this at work, and at home.

This issue has been around for a while, @njuettner @Raffo do you have any thoughts?

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

/remove-lifecycle stale

Is this being considered? It is very useful to have in multi-region deployments (i.e. multiple Kubernetes clusters) when using service discovery protocols such as JGroups DNS_PING. Would appreciate adding this feature! :)

I am also interested in this feature to help safely rollover traffic to a new cluster.

I would like external dns to run in both the current and incoming cluster and attach to their respective gateways (we are using Istio). The incoming cluster and current cluster should contribute to the same record in Route 53 but assign independent weights. For example, start with responding to 10% of DNS queries with the IP of the incoming Istio Ingress load balancer and the rest to the current load balancer. This requires the DNS provider to support weighted entries which Route 53 does but I'm not sure about others.

I am happy to help make this contribution if it is desired by the maintainers. I'd also love to hear other methods for achieving the same incremental rollout of services from one cluster to another.

One more use-case for this right here!
And same reason: safe rollout of the service on different clusters (different providers, even), so multiple external-dns instances.

We're looking for the same (multiple A records per domain) but for another purpose. We'd like to use Round-robin DNS in our cluster. Port 80 and 443 of every node is exposed to the public and can be used as an entry for all routes (handled by ingress-nginx, as described here).

Or is this already a feature that can be enabled via configuration?

Same here: we have many short-lived clusters and external-dns seems would be a good fit to automate these DNS records for our API gateways

Also need this feature

I'm looking at contributing to this issue (since I'm also interested in it), but wanted to discuss the experience before working on it.

  • Would this need a feature flag or argument to enable? (e.g. wouldn't be a default)
  • Would we want some sort of permission model for determining which external-dns instance can share a service/record?

I'm specifically focusing around the aws-sd provider (but will also test for the txt provider). When I created a new service in cluster-0 called nginx, the Cloud Map Service uses this for the Description field:

heritage=external-dns,external-dns/owner=cluster-0,external-dns/resource=service/default/nginx

Would it make sense to have an annotation on the k8s Service resource specifying it as a "shared" resource? That way, if both k8s clusters agree that the resource is shared, they will use a different behavior model and not overwrite each other's records (Cloud Map Service Instances).

For each record (Service Instance), I was thinking of adding Custom attributes for heritage, owner, and resource, and each external-dns instance would be responsible for updating the records if it's the owner.

There's a few operational checks that would need to exist around the Cloud Map Service resource (e.g. not deleting the service if other external-dns instances have records in there).

Any thoughts/opinions?

@buzzsurfr would be really cool if you can implement this feature!
Some thoughts:

  • I think this feature should be behind argument, since it does change behavior of external-dns quite a lot. Plus that would allow us to test this feature a bit more safely.
  • I think we should indicate to external-dns that some record should be marked as 'shared' (probably via additional annotation?). That way if existing record already exists, new record from a different cluster will not attempt to hi-jack it and start adding its own records to it. So all external-dns instances from various clusters should all be set to treat that record as shared.
  • As for permissions, I think above would fix it? For example, if there is already non shared record, and new instance tries to add a shared record -> it will spit out error instead. If record was already shared, and new instance has a record set to non shared, it will spit out error.
  • I am not sure how you would resolve the issue of external-dns/owner and external-dns/resource records in TXT record, since AFAIR it is used to validate inside external-dns which resource owns that record.I guess if both of them are set to shared then those validations will just be skipped?

Looking forward to checking out merge request, as I am curious how it will be implemented.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

/remove-lifecycle stale

We can add assigned IP to TXT record, than external-dns will be known which record is own

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

I have the same requirement, is this feature under development?

If it is annotate nodeport svc, externaldns can add multiple A records at the same time.
However, if Ingresses with the same domain name are published separately through IngressClass, only the A record in the first Ingress will be updated.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

Just adding it would be great to see a solution for handling things. I have a similar use case where we have a blue/green EKS cluster where both are running ExternalDNS and they would otherwise try to overwrite each other's Route53 records.

I'm having the same issue as well. It's important for multi-cluster architectures

Hi everyone,

The workaround I applied on multiple clusters for the same domain was to pass the arg --txt-owner-id:

For cluster A:

        args:
        --source=service
        --source=address
        --domain-filter=midomain.ai # (optional) limit to only example.com domains; change to match the zone created above.
        --provider=cloudflare
        --cloudflare-proxied
        --log-level=debug
        --txt-owner-id=cluster-a ---> here change owner id

For cluster B:

        args:
        --source=service
        --source=input
        --domain-filter=midomain.ai # (optional) limit to example.com domains only; change to match the zone created above.
        --provider=cloudflare
        --cloudflare-proxied
        --log-level=debug
        --txt-owner-id=cluster-b ---> here change owner id

With this I was able to install the external-dns for my midomain.ai domain in multiple clusters since the --txt-owner-id is different for each cluster and with that it does not give the error.

For example, I use it for Cloudflare and the txt entries that it creates for each cluster looks like this:

Cluster A:

"heritage=external-dns,external-dns/owner=cluster-a,external-dns/resource=ingress/argocd/ingress-argocd"

Cluster B:

"heritage=external-dns,external-dns/owner=cluster-b,external-dns/resource=ingress/staging/stg-staging"

I hope this one will be helpful.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

is there any known workaround for this? I attempted @Eramirez33's approach, but im still getting this conflict when both external-dns instances have unique owner-id:

Skipping endpoint...because owner id does not match...

@Eramirez33 You save my life bro!!!

commented

--txt-owner-id as suggested by @Eramirez33 works for the case of having different external-DNS instances managing different DNS names in the same zone. It doesn't work for having different external-DNS instances managing multiple A records for the same DNS name in the same zone, which was the original subject of this issue.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale