emilianscheel / Tagesschau-data-fetching

Vorratsdatenspeichert alle Tagesschau Artikel mithilfe der Tagesschau-Api. Die Daten sollen später ausgewertet werden.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tagesschau-data-fetching

Useful snippets

# gets json length from input file
jq length database.json
# gets number of files in currrent dir
find . -type f | wc -l
# gets size of current dir
du -hs
# gets last modified of file
stat database.json

Depends on

  • python3
  • urllib3
  • json
  • os
  • datetime
  • glob
  • BeautifulSoup
  • re

for analytics

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • networkx

Setup (Linux)

mkdir ~/apps
cd ~/apps
git clone https://github.com/emilianscheel/tagesschau-data-fetching
# Create system service
sudo nano /etc/systemd/system/tagesschau-data-fetching.service
  1. Replace <user> with your username
  2. Paste the configuration into the file ends with .service
[Unit]
Description=Tagesschau data fetching
User=<user>
After=multi-user.target
Wants=tagesschau-data-fetching.timer

[Service]
Type=oneshot
WorkingDirectory=/home/<user>/apps/Tagesschau-data-fetching/
ExecStart=/usr/bin/python3 main.py

[Install]
WantedBy=multi-user.target
# Create system timer
sudo nano /etc/systemd/system/tagesschau-data-fetching.timer
  1. Replace <user> with your username
  2. Paste the configuration into the file ends with .timer
[Unit]
Description=Fetches Tagesschau.de for data and saves it
Requires=tagesschau-data-fetching.service

[Timer]
Unit=tagesschau-data-fetching.service
OnCalendar=*:0/11

[Install]
WantedBy=timers.target
# starts and enables service, view status
sudo systemctl enable tagesschau-data-fetching.service
sudo systemctl start tagesschau-data-fetching.service
sudo systemctl status tagesschau-data-fetching.service

# starts and enables timer, view status
sudo systemctl enable tagesschau-data-fetching.timer
sudo systemctl start tagesschau-data-fetching.timer
sudo systemctl status tagesschau-data-fetching.timer

That configuration starts our system service every eleven minutes. The system service triggers the main.py script which is the fetching the tagesschau api.

About

Vorratsdatenspeichert alle Tagesschau Artikel mithilfe der Tagesschau-Api. Die Daten sollen später ausgewertet werden.


Languages

Language:Python 100.0%