DecafSunrise / Newspaper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Newspaper

The least uniquely named News Article retrieval script

image

What is it?

This repo holds ETL/ELT code to scrape news sites, do some light transforms, and save them to disk (and eventually a database).

How do I use it?

Currently, you can just run the newspaper_pull.py script. Feel free to change the saveDir line to change where your files save off to, and update sites.py if you want to get different news sites. The default list is pretty lengthy (~65 sites); it takes 10-15 minutes to run.

About


Languages

Language:Jupyter Notebook 94.7%Language:Python 5.3%