holyspiritomb / ebook_dictionary_creator

Code to create a database with cleaned up Wiktionary data (usable for all kinds of applications) and then to create ebook dictionaries based on this data. (Currently creates a Spanish Kindle dictionary, with more to come.)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ebook dictionary creator

This project has two goals:

  1. The first is to create a performant database containing words and their definitions, including all inflections and proper linkages between them. This allows you to get the definition of a word regardless of the word's case/tense. I will soon add an example explaining how you can use this as a fast dictionary with only one SQL query.
  2. Based on this data, it creates high-quality dictionaries compatible with recent ebook readers. It uses an algorithm to fix bugs in the Kindle lookup engine for this that prevent inflections from being found - even official dictionaries suffer from it. Look in the releases section for already available dictionaries.
  3. I will soon release the dictionaries as a Tabfile so that you can convert it to other formats as well

Contributions and feedback about words that are not displayed correctly or unhelpful definitions (I tried to remove them all/link them to parent definitions) are always welcome. Additionally, you can tell me if the dictionary for your language is missing essential information.

Available languages (see in releases):

  • Spanish
  • German
  • Swedish

Languages to be released (I still have to fix some bugs):

  • Italian (missing linkages)
  • French (missing linkages)
  • Portuguese (missing linkages)
  • Catalan (will have to use inflection tables)
  • Polish (will have to use inflection tables)
  • Finnish (will have to use inflection tables)
  • Latin (for some reason buggy)

Acknowledgements

This project would not have been possible without the https://kaikki.org/ data provided by Tatu Ylonen, the OpenRussian data and the Pyglossary library for the creation of the Kindle dictionary

Similar projects

https://github.com/nyg/wiktionary-to-kindle parses the Wiktionary itself.

https://github.com/efskap/kindlewick apparently works very similarly, also supports inflections, as of now it only supports smaller languages though (has been tested for Finnish).

https://github.com/BoboTiG/ebook-reader-dict is a program that also parses the Wiktionary dump itself and outputs Kobo compatible files

About

Code to create a database with cleaned up Wiktionary data (usable for all kinds of applications) and then to create ebook dictionaries based on this data. (Currently creates a Spanish Kindle dictionary, with more to come.)

License:GNU General Public License v3.0


Languages

Language:Python 99.4%Language:CSS 0.6%