Público News Scraper

This tool allows for the extraction of news articles from the Público. The scrapper joins the news metadata from the Público API with the texts by scrapping the news URL.

Setup & Usage Instructions

Below are the steps to set up your environment and run the scraper on your machine.

Environment Setup

Create and activate a Python virtual environment.

virtualenv venv --python=python3.8
source venv/bin/activate

Install the project dependencies.

pip install -r requirements.txt

Scraping Data

To scrape news articles for a specific date range, use the following command:

python scrape.py

About

Scrape news from Jornal Público.

MIT License

Languages

Language:Python 100.0%