daniele-sartiano / penelope-bot

A distributed web crawler based on containers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

penelope-bot

A distributed and asynchronous C++ Web Crawler based on Docker, nats.io and Scylla db.

Modules

  • Downloder
  • Parser
  • Data Manager
  • Common

Build and Run

make
docker-compose up --scale downloader=3

Downloader

compile downloader locally

dependencies

libprotobuf-dev protobuf-compiler libprotoc-dev
https://github.com/protobuf-c/protobuf-c

commands

cd downloader; mkdir -p build; cd build; cmake ..; make; cd ../..

Data Manager

dependencies

libuv1-dev

About

A distributed web crawler based on containers

License:GNU General Public License v3.0


Languages

Language:C++ 68.4%Language:CMake 21.1%Language:Python 5.9%Language:Makefile 2.6%Language:Dockerfile 2.1%