Clean up expired SSL certificates in order to prevent web timeouts
yeah opened this issue · comments
We have 53 domains and are running MIAB since 2017. By now, our /home/user-data/ssl
had grown to over 1.000 old expired certificate files.
Since get_ssl_certificates()
from management/ssl_certificates.py
is looping over all of these old certs, reading and analyzing them at least once per domain for most web requests of the management interface, we had a situation where the web interface had become unusable due to request timeouts (like in #1966). Another result of this was that DNSSEC RRSIG updates were not working properly anymore resulting in our domains not being resolved properly anymore by many DNS servers. (This is due to /etc/cron.daily/mailinabox-dnssec
using ~/tools/dns_update
which in turn makes web requests.
The simple make-it-work-now solution is to clean up /home/user-data/ssl
for old files.
But there should be a mechanism in MIAB that removes obsolete certificates or at least disregards them when looping in get_ssl_certificates()
.
To a degree, that's probably already done, but likely the time it takes to go through all the certs still exceeds the proxy timeout. Like I mentioned in #1966, extending the proxy timeout might fix this. Or as you say deleting old certs. If that isn't added in ssl_certificates.py, something like find /home/user-data/ssl/*-*.pem -maxdepth 1 -mtime +365 -delete
may work
I deleted old certs and that made the status panel appear again.
It must take longer than 60s to pull the data from all the certs. The default is proxy_read_timeout 60s;
https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
You can set the timeout to 5 minutes to test and see if that works any better, it would confirm the issue is related to nginx proxy timeout setting.
In /etc/nginx/conf.d/local.conf
add proxy_read_timeout 300s;
to the PRIMARY_DOMAIN's server block location /admin/ section. and restart nginx
...
server {
server_name <PRIMARY_DOMAIN>;
...
location /admin/ {
...
proxy_read_timeout 300s;
}
...
}
I've done this and can confirm that increasing the nginx timeout will resolve the issue. It's not a real fix though. 2 years down the line, or with a couple more domains/certs, the timeout will eventually be have to increased to 10 minutes, etc.
I've debugged management/ssl_certificates.py
and have pointed to where the core issue is. It's an easy fix.
IMHO, increasing the nginx timeout is just a bandaid, or more water in a leaking bucket so to speak :-)