medialab / sandcrawler

sandcrawler.js - the server-side scraping companion.

Home Page:http://medialab.github.io/sandcrawler/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can i scrape up to 50,000 pages in reasonable time ?

scroobius-pip opened this issue · comments

is this library suitable for scraping data of large amount of pages ?

Hello @scroobius-pip. This library is indeed suitable for scraping a large amount of pages. However, what's a "reasonable time"? Usually, when scraping, the bottleneck is more the sites you are hitting than your own computing power.