This tool allows for the extraction of news articles from the Público. The scrapper joins the news metadata from the Público API with the texts by scrapping the news URL.
Below are the steps to set up your environment and run the scraper on your machine.
- Create and activate a Python virtual environment.
virtualenv venv --python=python3.8
source venv/bin/activate
- Install the project dependencies.
pip install -r requirements.txt
To scrape news articles for a specific date range, use the following command:
python scrape.py