provide jupyter notebook for corpus generation
proppy opened this issue · comments
Johan Euphrosine commented
We should provide a notebook to document the corpus generation and allow developer to easily create alternative corpus.
This should cover:
- word extraction and filtering
- indexing
- text embeddeding model generation
- upload to database.
Johan Euphrosine commented
Interesting alternative corpus are available on https://www.ninjal.ac.jp/english/database/.
In particular http://chaki-data.ninjal.ac.jp/momotaro/momotaro-2015-11-10/ looks very interesting :)