roach-php / core

The complete web scraping toolkit for PHP.

Home Page:https://roach-php.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to mark scans as failed

illusive-ch opened this issue · comments

Maybe I am missing the documentation but when you send the spider out to say https://1013polebeauty.com it would be nice for me to be able to mark this domain as invalid so we do not crawl it again.

I looked into the downloader middleware and spider middleware however the response already seems to be processed at this point, is there any way that you can look at the response code and if there is an issue $yield or be able to update a Laravel model?

I think you need to check the domain before submitting. And cache the result. This way you will not send a request if the domain is not valid