scrapy-site-downloader

Overview

Template project for downloading a site with Scrapy. Crawls, scrapes, and saves HTML files from a given website, domain, and URL filters.

Clone this repository and cd into it
Install the dependencies using the following command:
```
pip install -r requirements.txt
```
Configure the crawler/spiders/site.py file for the site you want to crawl
Start the downloader using the following command (be sure to run this from the repository root!):
```
scrapy crawl site
```
Refer to the Scrapy documentation for best practices and other configuration options
When the crawler finishes, the HTML files will be located in the /html directory

Template project for downloading a site with Scrapy

Apache License 2.0

Language:Python 100.0%