scrapinghub / portia

Visual scraping for Scrapy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Set download delay

FolkSong opened this issue · comments

Is there a way to set a delay between page downloads? When I run it, it's just hammering the site and getting tons of 429 errors (too many requests).

I found a solution after a lot of trial and error. I had to create a local_slybot_settings.py file containing "AUTOTHROTTLE_ENABLED = True" and copy it inside the VM to /app/slyd/slybot/slybot. I did this from the CLI terminal (launched from Docker) with the following command (Transfer is a directory on my regular Windows hard drive where I created the file).

cp /app/data/projects/Transfer/local_slybot_settings.py /app/slyd/slybot/slybot/

It runs fine after doing that but I don't understand how it ever worked for others without this. Seems like it should be a default setting.