ivanprytula / itsy-bitsy-spider

Web crawler and scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[WIP] Itsy bitsy spider

Web crawler and scraper

Setup

  1. Create virtual environment -- python -m venv .venv
  2. Activate it -- . .venv/bin/activate
  3. Install dependencies -- pip install -r requirements.txt
  4. Explore directories with attempts to solve the problem/test task
  5. In general, all scripts are run as python <filename>.py --category="<category>" --location="<location>"

Usage

# with persistence support enabled, e.g. we can continue from where we left off and get only new businesses
scrapy crawl yelp -s JOBDIR=spiders/yelp -a category=Contractors -a location='San Francisco, CA' -o business_data.json

# Then, we can stop the spider safely at any time (by pressing Ctrl-C or sending a signal), and resume it later by issuing the same command

About

Web crawler and scraper

License:MIT License


Languages

Language:Python 100.0%