Crawling subdomains
H3JFC opened this issue · comments
Line 204 in 6012713
It looks like the parseLinks function returns nil if the parsed url is different from the base url (which is great for cases like the following hostname.com & nothostname.com). This works for most cases, but it would be nice to add an option for searching subdomains like app.hostname.com with a base url of hostname.com while crawling. I have some thoughts on a PR and would be happy to PR if there is interest.
Hi @H3JFC,
sure, this could indeed be useful. Any ideas and PR’s are of course appreciated :)
Greetings
Great @rverton! Do we want to search subdomains by default or should we add a searchSub boolean that we pass around in the Job struct? (and inherently pass into NewOfflineJob, NewOnlineJob, & Init funcs).
I think we can do both here: Adding a searchSub boolean and defaulting to true here. I think this is a sane default value because its often interesting what other technologies a specific domain makes use of.
PR #19
Merged #19