Transliteration from Arabic not working for continuous text
twardoch opened this issue · comments
Transliteration from Arabic is not working for continuous text. It works for single space-separated characters. The Arabic Wiktionary module is a bit complex, need to investigate and add some special processing.
Should I implement preprocessing and postprocessing functions in this case? It is like tokenizing continuous text in preprocessing and concat the transliteration results in postprocessing.
I think it’d be best to find out WHY it’s happening. There are multiple modules:
- https://en.wiktionary.org/wiki/Module:ar-translit
- https://en.wiktionary.org/wiki/Module:ks-Arab-translit
- https://en.wiktionary.org/wiki/Module:fa-translit
- https://en.wiktionary.org/wiki/Module:pa-Arab-translit
ar-translit
has an unusual tr
function: function export.tr(text, lang, sc, omit_i3raab, gray_i3raab, force_translit)
.
I could try to find out how to deal with this, or you might :)
We ought
I would add that when the language is set as fas
(Persian), even single letters are not transliterated.
Yeah, there are a few different Arabic-script transliterators and the whole notion of Arabic needs some special handling in our Py code.