Language files for WordDumb.
Wiktionary data come from kaikki.org and Dbnary, Chinese and French Wiktionary data are created with the Wiktextract tool. Word difficulty data sources are listed in each language subfolders.
-
Python
-
wget: download files
-
lemminflect: inflect English words
-
Open Chinese Convert: convert Chinese characters
-
wordfreq: get word frequency data
-
wiktextract-lemmatization: remove stress
-
perl, sed: Remove invalid text
-
lbunzip2 or bunzip2
-
pigz or gzip
$ python -m venv .venv
$ source .venv/bin/activate.fish
$ python -m pip install .
$ proficiency en
This work is licensed under GPL version 3 or later.