Is it possible to train multiple languages on a one model file ?
lomograb opened this issue · comments
Yes, but each script should be in a separate line.
For mixed scripts in the same line see this paper:
https://www.researchgate.net/publication/280777013_A_Sequence_Learning_Approach_for_Multiple_Script_Identification
Does CLSTM support this (mixed scripts in the same line) ?
It's not supported out-of-the-box, but you can implement what's described in that paper with clstm.
Thank you @amitdo for replying and this great project too. Okay, going to close this issue
As a note there is a model for doing the script identification exactly as described in the article (arrived upon independently) at kraken-models. It is able to differentiate between Arabic, Syriac, Cyrillic, Greek, Latin, and Fraktur.