processors/url_check: exclude_regex does not work when multiple regexes are configured
nodiscc opened this issue · comments
nodiscc commented
- https://github.com/awesome-selfhosted/awesome-selfhosted-data/blob/4c9aaad382486acf9d8ee92798bfa873d805c214/.hecat/url-check.yml
- https://github.com/awesome-selfhosted/awesome-selfhosted-data/actions/runs/4059507568/jobs/6987620205
...
exclude_regex:
- '^https://github.com/[\w\.\-]+/[\w\.\-]+$' # don't check URLs that will be processed by the github_metadata module
- '^https://retrospring.net/$' # DDoS protection page, always returns 403
- '^https://www.taiga.io/$' # always returns 403 Request forbidden by administrative rules
- '^https://docs.paperless-ngx.com/$' # DDoS protection page, always returns 403
- '^https://demo.paperless-ngx.com/$' # DDoS protection page, always returns 403
- '^https://git.dotclear.org/dev/dotclear$' # DDoS protection page, always returns 403
- '^https://github.com/clupasq/word-mastermind$' # the demo instance takes a long time to spin up, times out with the default 10s timeout
- '^https://getgrist.com/$' # hecat/python-requests bug? 'Received response with content-encoding: gzip,br, but failed to decode it.'
INFO:url_check.py: https://github.com/jhthorsen/app-mojopaste HTTP 200
^ this URL should be ignored