chrisjbryant / lmgec-lite

A language model-based approach to Grammatical Error Correction for English that uses minimal annotated data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to use it for languages other than English?

AASHISHAG opened this issue · comments

Hi, Thank you for making the code open-source.

Can we use the code-base for languages other than English, example German? I see you have used spaCy-en. Would installing spaCy-german help?

Heya,

Yes, you should be able to adapt the code for German. You'd mainly need different resources:

  1. A lot of native German text to build a German KenLM language model
  2. A German Hunspell dictionary equivalent to the one for English
  3. A German lemmatiser (e.g. spaCy) and a German inflection database.

Of these, the German inflection database is probably the hardest to obtain (although I've never checked). If you look at the resources/agid-2016.01.19/infl.txt you'd need an equivalent file of German lemmas and their inflected forms.

It won't be as easy as just using spaCy-de sadly.

Thank you for the quick response.

I will get back, when I am able to gather all the resources. Thank you. 👍