rodp63 / yams

A simple Python scraper to collect data from the most popular Peruvian news sites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Yet Another Media Scraper

Code style: black Imports: isort

$ yams
Usage: yams [OPTIONS] COMMAND [ARGS]...

Options:
  --version   Show the version and exit.
  -h, --help  Show this message and exit.

Commands:
  info   Display useful information
  start  Start a crawling process

Installation

$ python setup.py install

Examples

$ yams start newspaper --help 
Usage: yams start newspaper [OPTIONS] SOURCE

Options:
  -k, --keyword TEXT      Set one keyword for post retrieval  [required]
  -o, --output FILE       Save output to json FILE instead of stdout
  -s, --since [%Y-%m-%d]  Set the lower date of the posts to retrieve
  -t, --to [%Y-%m-%d]     Set the upper date of the posts to retrieve
  --exact-match           Look for the exact match of the keywords
  -h, --help              Show this message and exit.

# Get all the posts containing the word 'peru' in the last month (by default).
$ yams start newspaper elcomercio -k peru
# Get all the post containing the exact word 'congreso' between two dates and save it in a json file.
$ yams start newspaper elcomercio -k congreso -s '2023-01-01' -t '2023-06-30' -o data --exact-match

Deployment

$ make build-image
$ make upload-image

Formatting

$ make lint

About

A simple Python scraper to collect data from the most popular Peruvian news sites

License:MIT License


Languages

Language:Python 96.2%Language:Dockerfile 2.1%Language:Makefile 1.7%