Scrapy-redis usage related issues

Question

Scrapy-redis usage related issues

Hao1617 opened this issue 2 months ago · comments

Description

Let me introduce my python program: First, I get the data I need to crawl from MySQL and then process the data into a URL through

Submit to redis and wait for crawling. If crawling fails, it will be resubmitted

Successful data is stored in MySQL through the pipeline. Get data from MySQL to redis once a day.I can confirm that it only runs once a day.

After running, the amount of data stored in redis is abnormal.The data is larger than expected At present, it is found that it is caused by resubmitting the URL when crawling errors. How can I solve it?

Hao · Answer 1 · Thu May 09 2024 15:59:08 GMT+0800 (China Standard Time)

Hao commented 2 months ago

R Max Espinoza · Answer 2 · Thu May 09 2024 16:41:53 GMT+0800 (China Standard Time)

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

Hao · Answer 3 · Thu May 09 2024 17:19:08 GMT+0800 (China Standard Time)

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

ok

Hao · Answer 4 · Thu May 09 2024 17:20:20 GMT+0800 (China Standard Time)

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

Does it mean that if the request fails, I don't need to add it again manually, the framework will automatically retry?

Hao · Answer 5 · Thu May 09 2024 17:21:19 GMT+0800 (China Standard Time)

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

As shown.

This can be deleted

Hao · Answer 6 · Thu May 09 2024 17:53:14 GMT+0800 (China Standard Time)

Pass in the request meta the number of retry so that it doesn't retry indefinitely.

Under what circumstances will it retry

Hao · Answer 7 · Fri May 10 2024 17:14:11 GMT+0800 (China Standard Time)

This is a problem with program logic. Please help me close this discussion.