schliesser / sitecrawler

TYPO3 sitemap crawler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support basic auth

schliesser opened this issue · comments

If the project is protected by basic auth the sitemap urls can't be crawled currently.

The url-list may be built by adding the credentials to the sitemap url like https://user:pass@domain.com/sitemap.xml. But the links in the url-list don't contain the user:pass@ part and therefore they cannot be processed correctly.

Basic Auth can be provided by header like described here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization

Currently headers are only added to the requests made to each entry in the sitemap. For the index building the headers currently not set. This needs to be changed.