pchr8 / up_crawler

Script that downloads articles from Ukrainska Pravda

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix broken download of older articles

pchr8 opened this issue · comments

Some of the very old articles have a different page structure, e.g. no author names, and this breaks the parsing.

I remember this being an issue but can't reproduce quickly anymore, even with 20+ years old articles, but I'm certain it's an issue that is present.

One way to reproduce would be to download the oldest year sitemaps are present and see if it breaks.