Some translations are not possible
leandroalbero opened this issue · comments
Issue description
Running latest image easynmt/api:2.0-cpu
with the model set to m2m_100_418M
and english as target language fails for some translations. Here are some examples:
- 'imagina a mi'
- 'imagina un sol'
- 'imagina a un vikingo'
In this case for example, setting the source_lang to 'es' fixed the issue, so maybe the problem is somewhere in the language detection step or that there isn't a translation direction from the detected language to english.
Docker logs output:
[2023-09-28 08:38:08 +0000] [60] [INFO] Waiting for application startup.
[2023-09-28 08:38:08 +0000] [60] [INFO] Application startup complete.
Exception: 'jbo'
the text of the exception varies with every prompt, I guess it is the code of the detected language
Updating the model used by fasttext for language identification helps solve the issue, at least for the translations that failed in my tests.
https://fasttext.cc/docs/en/language-identification.html
This repo is using lid.176.ftz
, switching to lid.176.bin
helps because it is slightly more accurate
Lines to change are here:
Lines 415 to 430 in 7c11ae8
Yet there are still some translations that fail, maybe enabling a fallback in those cases to a slower model could help