A program for downloading online articles and saving them in a SQLLite database.
- You need to have python 3.X and Beautiful Soup installed.
pip install beautifulsoup4
- Clone the repository
mkdir git
cd git
git clone https://github.com/th0rben/news-scraper.git
- For sending e-mails you need to create a file named: login_data.py in src folder. It should look this (example for using gmail):
sender = "sender@gmail.com"
recipient = "recipient"
password = "password"
subject = "subject"
server = "smtp.gmail.com"
port = 465
Execute the main.py file
To scrap every day at 12:00 execute: (adds cronjob to crontab)
cd where/you/saved/it/news-scraper
sudo chmod +x setup.sh
.setup.sh
sudo crontab -e
0 18 * * 1 /home/pi/git/news-scraper/cron/cron.sh
If you want to change the frequency or time: Change cronjob.txt
For mor information see: https://en.wikipedia.org/wiki/Cron
Until today there are no tests.
It would be great if you mention any mistakes you stumble over.
- Beautiful Soup - Python library for pulling data out of HTML and XML files
- Eclipse - IDE
- PyDev - Python IDE for Eclipse
It would be great if you mention any mistakes you stumble over.
[27.07.2018] - Scrap articles vom bild.de
This project is still in the Beta-Version
- th0rben - Initial work - th0rben
This project is licensed under the GPL License - see the LICENSE.md file for details
- This project is inspired by the Spiegel Mining project by D. Kriesel