bardusco / scrapydweb-docker

scrapydweb, dockerfile

Home Page:https://github.com/my8100/scrapydweb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

README - Scrapy Platform Project

This repository contains the Dockerfile and docker-compose for setting up and deploying Scrapy Platform

Docker Build

Use the following command to build the Docker image:

docker build -t scrapydweb:latest https://github.com/bardusco/scrapydweb-docker.git\#master

Docker Run

To run the Docker container, use the commands below:

Run the background-daemon for Scrapyd

$ docker-compose up -d scrapyd

Run the background-daemon for Scrapydweb

$ docker-compose up -d scrapydweb

Environment Variables

  • USERNAME: Basic auth username, default: admin
  • PASSWORD: Basic auth password, default: admin
  • SCRAPYD_SERVERS: Scrapyd servers

Create a .env file in the working directory with the following variables:

SCRAPYD_SERVERS=172.17.0.1:6800
USERNAME=my_name
PASSWORD=my_very_secure_password

Gerapy

Projects folder: /var/lib/docker/volumes/scrapydweb-docker_gerapy-data/_data/projects Put your scrapy projects here

About the Project

Scrapy is an open-source and collaborative framework for extracting the data you need from websites in a fast, simple, yet extensible way.

Scrapyd is a service for running Scrapy spiders. It allows you to deploy your Scrapy projects and control their spiders using a HTTP JSON API.

Scrapyd-client is a client for Scrapyd, providing the scrapyd-deploy utility that enables you to deploy your project to a Scrapyd server.

Additional packages incorporated into the project include scrapy-splash for Scrapy+JavaScript integration using Splash, scrapyrt for easily adding HTTP API to your existing Scrapy project, spidermon as a framework to build monitors for Scrapy spiders, and scrapy-poet as the web-poet Page Object pattern implementation for Scrapy.

The image is based on vimagick/scrapyd, with the seven latest stable python packages installed:

  • scrapy==2.8.0
  • scrapyd==1.4.1
  • scrapyd-client==v1.2.3
  • scrapy-splash==latest
  • scrapyrt==v0.13
  • spidermon==1.17.1
  • scrapy-poet==0.10.1

About

scrapydweb, dockerfile

https://github.com/my8100/scrapydweb


Languages

Language:Python 86.8%Language:Makefile 7.4%Language:Shell 4.0%Language:Dockerfile 1.9%