How to use it for languages other than English?
AASHISHAG opened this issue · comments
Hi, Thank you for making the code open-source.
Can we use the code-base for languages other than English, example German? I see you have used spaCy-en. Would installing spaCy-german help?
Heya,
Yes, you should be able to adapt the code for German. You'd mainly need different resources:
- A lot of native German text to build a German KenLM language model
- A German Hunspell dictionary equivalent to the one for English
- A German lemmatiser (e.g. spaCy) and a German inflection database.
Of these, the German inflection database is probably the hardest to obtain (although I've never checked). If you look at the resources/agid-2016.01.19/infl.txt
you'd need an equivalent file of German lemmas and their inflected forms.
It won't be as easy as just using spaCy-de sadly.
Thank you for the quick response.
I will get back, when I am able to gather all the resources. Thank you. 👍