rmax / scrapy-redis

Redis-based components for Scrapy.

Home Page:http://scrapy-redis.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scrapy-redis usage related issues

Hao1617 opened this issue · comments

commented

Description

Let me introduce my python program: First, I get the data I need to crawl from MySQL and then process the data into a URL through
image
Submit to redis and wait for crawling. If crawling fails, it will be resubmitted
image
Successful data is stored in MySQL through the pipeline. Get data from MySQL to redis once a day.I can confirm that it only runs once a day.

After running, the amount of data stored in redis is abnormal.The data is larger than expected At present, it is found that it is caused by resubmitting the URL when crawling errors. How can I solve it?

commented

image

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

commented

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

ok

commented

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

Does it mean that if the request fails, I don't need to add it again manually, the framework will automatically retry?

commented

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

As shown.
image
This can be deleted

commented

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

Under what circumstances will it retry

commented

This is a problem with program logic. Please help me close this discussion.