koverholt / scrapy-site-downloader

Template project for downloading a site with Scrapy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scrapy-site-downloader

Overview

Template project for downloading a site with Scrapy. Crawls, scrapes, and saves HTML files from a given website, domain, and URL filters.

Steps to run

  1. Clone this repository and cd into it
  2. Install the dependencies using the following command:
    pip install -r requirements.txt
    
  3. Configure the crawler/spiders/site.py file for the site you want to crawl
  4. Start the downloader using the following command (be sure to run this from the repository root!):
    scrapy crawl site
    
  5. Refer to the Scrapy documentation for best practices and other configuration options
  6. When the crawler finishes, the HTML files will be located in the /html directory

About

Template project for downloading a site with Scrapy

License:Apache License 2.0


Languages

Language:Python 100.0%