Is it possible to train multiple languages on a one model file ?

Question

Is it possible to train multiple languages on a one model file ?

lomograb opened this issue 7 years ago · comments

lomograb commented 7 years ago

Amit D. · Answer 1 · Fri Jul 14 2017 05:46:26 GMT+0800 (China Standard Time)

Yes, but each script should be in a separate line.

Amit D. · Answer 2 · Fri Jul 14 2017 21:17:00 GMT+0800 (China Standard Time)

For mixed scripts in the same line see this paper:
https://www.researchgate.net/publication/280777013_A_Sequence_Learning_Approach_for_Multiple_Script_Identification

lomograb · Answer 3 · Sun Jul 16 2017 03:07:07 GMT+0800 (China Standard Time)

Does CLSTM support this (mixed scripts in the same line) ?

Amit D. · Answer 4 · Sun Jul 16 2017 18:15:30 GMT+0800 (China Standard Time)

It's not supported out-of-the-box, but you can implement what's described in that paper with clstm.

lomograb · Answer 5 · Sun Jul 16 2017 19:34:34 GMT+0800 (China Standard Time)

Thank you @amitdo for replying and this great project too. Okay, going to close this issue

mittagessen · Answer 6 · Mon Jul 17 2017 18:07:31 GMT+0800 (China Standard Time)

As a note there is a model for doing the script identification exactly as described in the article (arrived upon independently) at kraken-models. It is able to differentiate between Arabic, Syriac, Cyrillic, Greek, Latin, and Fraktur.