isspek / abyznewslinks.com_crawler

Crawls news agencies from abyznewslinks.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Abyz Crawler

This is a crawler for the website Abyz News Links, extracting the news agencies alongside with their metadata per country.

This readme is intended for developers.

Note that some files in this repository are encrypted via vauly. Check the project's setup file on how to setup a local development environment for this project and use or avoid vauly.

Run Crawler

Most commands need to be run from within the virtual environment, which can be activated via: source venv/bin/activate

First, provide the following environment variables, specifying the URI and the database name of the MongoDB you want to persist the news agencies to:

  • MONGO_URI
  • MONGO_DB

Run the following command to crawl news agencies and store them inside the local database:

scrapy crawl news-agencies

Run MongoDB

Start up the container via:

docker-compose up -d

About

Crawls news agencies from abyznewslinks.com

License:GNU General Public License v3.0


Languages

Language:Python 98.0%Language:JavaScript 1.3%Language:Jinja 0.8%