EdmundMartin / Scrapio

Asyncio web crawling framework. Work in progress.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems with link parsing

EdmundMartin opened this issue · comments

On certain certain sites the crawler is straying off and making requests to URLs which should not be crawled. This seems to be quite a subtle issue as it's not happening on all sites. Will take a look into the issue sometime in the future when I have the time.

Fixed. Issue was due to robots cache existing but host not being robots cache.