How can I train this model with arabic or urdu characters?

Question

How can I train this model with arabic or urdu characters?

ghulammustufa31 opened this issue 5 years ago · comments

My labels contain arabic/urdu text.
For example "اسلام آباد : چیئرمین رضابانی کی زیر صدارت سینیٹ کا اجلاس"

What changes are required to train the model given non-English labels?

Edouard Belval · Answer 1 · Fri Feb 15 2019 21:45:11 GMT+0800 (China Standard Time)

So according to britannica, Arabic has 28 letters which means that it would be more compatible with the CRNN architecture than a word-based language like Chinese. I think that you can expect reasonable results by simply replacing the values in CRNN/config.py and expect somewhat workable results. Since Arabic is read right to left, you might encounter some issue but you'll have to try to be sure.

Now for Urdu, the same process can be applied, but some characters seem to be very wide. Since CRNN is not attention based this could make it very hard to converge.