kbatsuren / wiktra

Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transliteration from Arabic not working for continuous text

twardoch opened this issue · comments

Transliteration from Arabic is not working for continuous text. It works for single space-separated characters. The Arabic Wiktionary module is a bit complex, need to investigate and add some special processing.

Should I implement preprocessing and postprocessing functions in this case? It is like tokenizing continuous text in preprocessing and concat the transliteration results in postprocessing.

I think it’d be best to find out WHY it’s happening. There are multiple modules:

ar-translit has an unusual tr function: function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit).

I could try to find out how to deal with this, or you might :)

We ought

I would add that when the language is set as fas (Persian), even single letters are not transliterated.

Yeah, there are a few different Arabic-script transliterators and the whole notion of Arabic needs some special handling in our Py code.