certbot / certbot

Certbot is EFF's tool to obtain certs from Let's Encrypt and (optionally) auto-enable HTTPS on your server. It can also act as a client for any other CA that uses the ACME protocol.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

--dns-rfc2136 sporadic "Some challenges have failed" : multiple / secondary nameservers?

DXXS opened this issue · comments

commented

When attempting to update certificates with --dns-rfc2136, I have regularly experienced sporadic failures to update (as I have also seen others report elsewhere, without seeing a resolution yet!?). After overcoming this in updating the certificates for my domains multiple times now, this appears likely in my case to be an issue with multiple dns servers, wherein the records on the secondary dns ('ns2') is not updated for some reason, and the polling sources randomly are answered by one name server or the other.

In order to get around this, it seems that I can either disable my secondary dns server (though the TTL is 1 day, which makes this unpleasant), or run certbot multiple times so that it will randomly use the dns server which has been updated with the correct _acme-challenge value. [NOTE: I am currently updating 3x domains on the same certificate, so sometimes it has taken quite a few tries].

My secondary dns servers are currently hosted on a server in a different country entirely from the origin (though otherwise pretty much the same OS). I'm wondering what it means by "Ensure the above domains are hosted by this DNS provider"??

I am wondering why certbot's rfc2136 implementation apparently isn't currently just updating the TXT record on all name servers for each domain? I have also tried it with longer --dns-rfc2136-propagation-seconds, but it doesn't seem to help (!?).

David


lsb_release -a

No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster

certbot --version

certbot 1.27.0


certbot certonly --dns-rfc2136 --dns-rfc2136-credentials ~/.secrets/rfc2136.ini --dns-rfc2136-propagation-seconds 180 --force-renew --cert-name geddo.in --preferred-challenges=dns --email=me@here.com --agree-tos -d geddo.in -d *.geddo.in

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Renewing an existing certificate for geddo.in
Waiting 180 seconds for DNS changes to propagate

Certbot failed to authenticate some domains (authenticator: dns-rfc2136). The Certificate Authority reported these problems:
Domain: geddo.in
Type: unauthorized
Detail: Incorrect TXT record "" found at _acme-challenge.geddo.in

Hint: The Certificate Authority failed to verify the DNS TXT records created by --dns-rfc2136. Ensure the above domains are hosted by this DNS provider, or try increasing --dns-rfc2136-propagation-seconds (currently 180 seconds).


FWIW, If I run 'dig _acme-challenge.geddo.in txt' on another server, it is currently randomly pulling up one or the other value.

commented

FWIW, dns_rfc2136_server is set to the local dns server's IP address, which gets updated, but that server is apparently not explicitly used by the code that is polling for the _acme-challenge.geddo.in value later (!?).

pip list | grep rfc2136

certbot-dns-rfc2136 0.24.0

I'm wondering what it means by "Ensure the above domains are hosted by this DNS provider"??

The ACME validation server resolves the TXT RR from the root zone all the way down to (one of) the authorative DNS servers of the FQDN. So all authorative DNS servers configured for a certain FQDN need to be able to provide the correct TXT RR for consistent validation.

I am wondering why certbot's rfc2136 implementation apparently isn't currently just updating the TXT record on all name servers for each domain?

The dns-rfc2136 plugin updates the DNS server you have configured it to update. Usually one would have a primary DNS server which is updated by e.g. the RFC2136 protocol and the secondary DNS servers get updated by that primary DNS server. I'm not sure how you would envision the dns-rfc2136 plugin to update all DNS servers? Those other DNS servers could perhaps be situated somewhere else entirely and Certbot might not have permission to update that server at all. Also, I don't think the dns-rfc2136 plugin supports multiple DNS servers anyway.

FWIW, dns_rfc2136_server is set to the local dns server's IP address, which gets updated, but that server is apparently not explicitly used by the code that is polling for the _acme-challenge.geddo.in value later (!?).

As said above, the ACME server is doing the validation and not Certbot itself. The validation server of the CA traverses the root zone all the way down to the authorative DNS servers for the FQDN. If your "local dns server" is not an authorative server for the FQDN, updating that local DNS server is useless.

commented

To be more explicit: I've used 'dig' to explicitly use a particular name server for resolution before; I'm wondering why the ACME server apparently isn't just provided the server that's been updated to use it to directly do the lookup from the authoritative server?

Further, in named.conf.local, 'allow-transfer' is set on both servers zones to provide the alternate DNS's IP address : I'm wondering what else my servers might have to do in order that they may ensure that the secondary may be updated also, & within the --dns-rfc2136-propagation-seconds period?

My server is 'authoritative', and I've generally verified that the propagation has occurred to non-authoritative servers between after expiration of the TTL period, it's just that the secondary is apparently not.

commented

[the authoritative primary server is being run on the same machine that certbot is being executed on]

I'm wondering why the ACME server apparently isn't just provided the server that's been updated to use it to directly do the lookup from the authoritative server?

That's simply not part of the ACME protocol (RFC 8555). Nor part of the validation methods of a Certificate Authority. Note that often DNS servers are spread throughout the world, often using unicast. And CAs often use multiple vantage points worldwide for validation, at least Let's Encrypt does.

Further, in named.conf.local, 'allow-transfer' is set on both servers zones to provide the alternate DNS's IP address : I'm wondering what else my servers might have to do in order that they may ensure that the secondary may be updated also, & within the --dns-rfc2136-propagation-seconds period?

I don't know, allowing a transfer is not the same to actually do the transfers if you ask me 😛 But I don't know your exact setup. Might be as simple as a too low propogation period.

and I've generally verified that the propagation has occurred to non-authoritative servers between after expiration of the TTL period

Non-authorative DNS servers aren't relevant, as said before.

it's just that the secondary is apparently not.

You could try to increase the --dns-rfc2136-propagation-seconds option if your secondary authorative DNS servers are rather slow. Or try to speed that up somehow.

Anyway, I'm not sure how Certbot can improve this, within the constraints of the ACME protocol and ACME server validation methods.

commented

Well, in an initial viewing of the text of the 4/97 revision of rfc2136, it would appear that they do conceive of zones which have multiple masters (as mine is currently set up to be : maybe I should try changing that the next time I have to update my certs), but section 4.3 appears to indicate that the 'primary master' should be determined & UPDATE attempted there first (and maybe iterating through others), but then it seems to indicate that one should stop as soon as a successful result is obtained, and doesn't really indicate how any other 'master' might end up being updated.

Given that it doesn't necessarily preclude explicit updating of additional masters, I think you might try messaging them all.

commented

Well, with decades of experience in the commercial software development world, in my experience, this is something that one would be expected to address in one's own code (regardless of other people's protocol specs), including emitting warnings such as about un-updated masters or unexpected configurations.

I haven't carefully read through the comments but:

  • If you are advertising multiple nameservers for you domain, you should expect that DNS recursors (like Let's Encrypt) will randomly query any of those nameservers, not just the primary. If you only update the primary, then clients will see inconsistent views of your zone. This is not unique to Let's Encrypt or ACME.
  • Certbot expects a DNS topology where if you have a primary/secondaries (master/slaves in BIND nomenclature) setup, then you will have already configured NOTIFY and Zone Transfer between your authoritative nameservers, to keep them automatically in sync in ~realtime. This task is out of scope for Certbot: wrong tool for the job.

On the second point, perhaps it is worth us briefly documenting this requirement on https://certbot-dns-rfc2136.readthedocs.io/en/stable/. We'd accept a PR for that.

I'm going to close this in the hope that we have arrived at an understanding about how things fit together.