Add lemmatization
jacopofar opened this issue · comments
The script to generate the cards only uses word frequency to isolate prepositions and articles and other common elements. It would be very useful if it also applied lemmatization for some language, to let the user find the conjugation.
The import script should be changed to be able to perform a merge between the current content of the DB and the data, to preserve user card status.
Worth noting: https://github.com/tatuylonen/wiktextract
Done using the wiktionary extractor