kiwix / overview

Home Page:https://kiwix.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraping of German Wiktionary not working correctly (missing photos & translations)

opened this issue · comments

The German Wiktionary is not being scraped properly. The code to scrape the German Wiktionary fails to obtain many photos and translations.

For example, the .zim version from February 2022 shows the headword "schütten" without any photo:

image

However, the online version shows a photo that was added to the Wiki on November 3, 2021:

image

image

Therefore, the "February" .zim did not include a photo that was already there. I tried the Kiwix App on Android and GoldenDict. The image is simply missing. That issue is relatively common in the German Wiktionary. The same happens with the Translations Box.

It seems that the code to scrape the German Wiktionary needs to be updated/improved.

Thanks for your hard work ! :D

@kelson42 I am not sure about how to report his bug properly to Wikimedia Headquarters. I have doubts about the proper category and the technical details (I am not a programmer). Could you please kindly report the bug with technical information to Wikimedia ?

I collected several links with the issue of lacking images on .zim files that are present on the German Wiktionary:
https://de.wiktionary.org/wiki/Fensterbrett
https://de.wiktionary.org/wiki/schütten
https://de.wiktionary.org/wiki/riechen
https://de.wiktionary.org/wiki/parken
https://de.wiktionary.org/wiki/ernten

3 photos on Wiktionary and 2 missing:
https://de.wiktionary.org/wiki/balancieren