defgsus / teletext-archive

archive of german teletext websites

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

archive of german online teletexts

Or videotext, as we used to call it.

DEPRECATED: Collecting raw html files every 30 minutes is just too much:

  • for github: repo size is 800 mb after only 3 weeks
  • for parsing: it takes 6 single-thread hours to beautiful-soup through all files in each commit

A slimmer version runs at teletext-archive-unicode

Below is historical

------8<------8<------8<------8<------8<------

This repo exists mainly because it's just possible to scrape those online teletexts with github actions. And, you know, interesting stuff might evolve from historic beholding.

The data is collected raw in docs/snapshots. Each commit adds, overwrites or removes the individual files of each teletext page.

scraped stations:

station since type link
3sat 2022-01-28 html with font-map https://blog.3sat.de/ttx/
ARD 2022-01-28 html https://www.ard-text.de/
NDR 2022-01-27 html https://www.ndr.de/fernsehen/videotext/index.html
n-tv 2022-01-28 json https://www.n-tv.de/mediathek/teletext/
SR 2022-01-28 html https://www.saartext.de/
WDR 2022-01-28 html https://www1.wdr.de/wdrtext/index.html
ZDF 2022-01-27 html https://teletext.zdf.de/teletext/zdf/
ZDFinfo 2022-01-27 html https://teletext.zdf.de/teletext/zdfinfo/
ZDFneo 2022-01-27 html https://teletext.zdf.de/teletext/zdfneo/

related stuff

Oh boy, look what else exists on the web:

TODO

beyond the borders

About

archive of german teletext websites


Languages

Language:Python 100.0%