This repository contains the Dockerfile and docker-compose for setting up and deploying Scrapy Platform
Use the following command to build the Docker image:
docker build -t scrapydweb:latest https://github.com/bardusco/scrapydweb-docker.git\#master
To run the Docker container, use the commands below:
Run the background-daemon for Scrapyd
$ docker-compose up -d scrapyd
Run the background-daemon for Scrapydweb
$ docker-compose up -d scrapydweb
USERNAME
: Basic auth username, default: adminPASSWORD
: Basic auth password, default: adminSCRAPYD_SERVERS
: Scrapyd servers
Create a .env
file in the working directory with the following variables:
SCRAPYD_SERVERS=172.17.0.1:6800
USERNAME=my_name
PASSWORD=my_very_secure_password
Projects folder: /var/lib/docker/volumes/scrapydweb-docker_gerapy-data/_data/projects
Put your scrapy projects here
Scrapy is an open-source and collaborative framework for extracting the data you need from websites in a fast, simple, yet extensible way.
Scrapyd is a service for running Scrapy spiders. It allows you to deploy your Scrapy projects and control their spiders using a HTTP JSON API.
Scrapyd-client is a client for Scrapyd, providing the scrapyd-deploy
utility that enables you to deploy your project to a Scrapyd server.
Additional packages incorporated into the project include scrapy-splash
for Scrapy+JavaScript integration using Splash, scrapyrt
for easily adding HTTP API to your existing Scrapy project, spidermon
as a framework to build monitors for Scrapy spiders, and scrapy-poet
as the web-poet Page Object pattern implementation for Scrapy.
The image is based on vimagick/scrapyd
, with the seven latest stable python packages installed:
scrapy==2.8.0
scrapyd==1.4.1
scrapyd-client==v1.2.3
scrapy-splash==latest
scrapyrt==v0.13
spidermon==1.17.1
scrapy-poet==0.10.1