rmax / scrapy-redis

Redis-based components for Scrapy.

Home Page:http://scrapy-redis.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there a way to stop spider check duplicate with redis ?

milkeasd opened this issue · comments

My spider was extremely slow when run with scrapy-redis. Because there is a big delay between slave and master. I want to reduce the commuication to just only getting the start_urls periodically or when all start_urls is done, Is there any ways to do so ?

Moreover, I want to stop the duplication check to reduce the number of connection.

But, I cant change the DUPEFILTER_CLASS to scrapy default one, it raise error.

Is there any other ways to stop the duplicate check ?

Or any ideas can help speed up the process ?

Thanks

@Germey Any ideas?

@milkeasd
Could you provide related code files?

The way I see, let developer customize their communication rules and add a disable option for DUPEFILTER_CLASS can be two great features.

@milkeasd could you please provide your code or make some sample code?