RamonRomeroQro / Python-Distributed-Scrapping

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Distributed WebScrapping

https://github.com/RamonRomeroQro/Python-Distributed-Scrapping


Copyright 2019 
© Ramon Romero   @RamonRomeroQro

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

SetUp

$ python3 -m venv venv
  • Activate venv
$ source venv/bin/activate
  • install requirements
$ pip install -r requirements.txt
  • Master Node : src/crawler/master.py
  • Instancia Esclavo : src/crawler/slave.py [name]
  • Parametros : src/settings.json
  • database : src/db/init.sh

Demo

DEMO

Servicios de prueba

  • service : test/src/init.py [PORT]
  • generate : test/src/generate.py [PORT]

Dataset retrived from:

https://www.kaggle.com/ikarus777/best-artworks-of-all-time

About

License:GNU General Public License v3.0


Languages

Language:HTML 99.8%Language:Python 0.2%Language:Shell 0.0%