Is there a way to stop spider check duplicate with redis ?
milkeasd opened this issue · comments
My spider was extremely slow when run with scrapy-redis. Because there is a big delay between slave and master. I want to reduce the commuication to just only getting the start_urls periodically or when all start_urls is done, Is there any ways to do so ?
Moreover, I want to stop the duplication check to reduce the number of connection.
But, I cant change the DUPEFILTER_CLASS to scrapy default one, it raise error.
Is there any other ways to stop the duplicate check ?
Or any ideas can help speed up the process ?
Thanks
@Germey Any ideas?
@milkeasd
Could you provide related code files?
The way I see, let developer customize their communication rules and add a disable option for DUPEFILTER_CLASS
can be two great features.
@milkeasd
For disable DUPEFILTER_CLASS
, try this https://stackoverflow.com/questions/23131283/how-to-force-scrapy-to-crawl-duplicate-url