argosopentech / argos-translate

Open-source offline translation library written in Python

Home Page:https://www.argosopentech.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Stanza version >=1.1.1 breaks a few langauges

yudelevi opened this issue · comments

I spent quite a bit of time figuring out what was wrong.
I wanted to upgrade Stanza to the latest 1.8.1, but by default, it overwrites the resources.json file inside the packages.
The list of languages I couldn't get working with stanza 1.8.1: az, bn, eo, ms , sq, tl , zt, tr

While I managed to avoid the files being overwritten by specifying download_method=None and allow_unknown_language=False, the resource file format significantly changed between version 1.8.1 and 1.1.1

I ended up downgrading until I hit a version that worked and fell back to 1.1.1. I only ran into this after upgrading to 1.9.2 where the stanza version is >=1.2.1

with stanza==1.1.1 (ignore download_method=None, it doesn't exist before 1.4.0)

>>> import stanza
>>> p=stanza.Pipeline(
...             lang="az",
...             dir="/home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza",
...             processors="tokenize",
...             use_gpu=True,
...             logging_level="DEBUG",
...             download_method=None
...         )
2024-03-14 04:25:27 DEBUG: Loading resource file...
2024-03-14 04:25:27 DEBUG: Processing parameter "processors"...
2024-03-14 04:25:27 DEBUG: Found tokenize: imst.
2024-03-14 04:25:27 INFO: Loading these models for language: az (Turkish):
=======================
| Processor | Package |
-----------------------
| tokenize  | imst    |
=======================

2024-03-14 04:25:27 INFO: Use device: gpu
2024-03-14 04:25:27 INFO: Loading: tokenize
2024-03-14 04:25:27 DEBUG: With settings: 
2024-03-14 04:25:27 DEBUG: {'model_path': '/home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/tokenize/imst.pt', 'lang': 'az', 'mode': 'predict'}
2024-03-14 04:25:27 INFO: Done loading processors!

With stanza==1.2:

2024-03-14 04:26:24 INFO: Use device: gpu
2024-03-14 04:26:24 INFO: Loading: tokenize
2024-03-14 04:26:24 DEBUG: With settings: 
2024-03-14 04:26:24 DEBUG: {'model_path': '/home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/tokenize/imst.pt', 'lang': 'az', 'mode': 'predict'}
2024-03-14 04:26:25 INFO: Loading: mwt
2024-03-14 04:26:25 DEBUG: With settings: 
2024-03-14 04:26:25 DEBUG: {'model_path': '/home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/mwt/imst.pt', 'lang': 'az', 'mode': 'predict'}
2024-03-14 04:26:25 ERROR: Cannot load model from /home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/mwt/imst.pt
Traceback (most recent call last):
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/pipeline/core.py", line 128, in __init__
    self.processors[processor_name] = NAME_TO_PROCESSOR_CLASS[processor_name](config=curr_processor_config,
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/pipeline/processor.py", line 155, in __init__
    self._set_up_model(config, use_gpu)
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/pipeline/mwt_processor.py", line 21, in _set_up_model
    self._trainer = Trainer(model_file=config['model_path'], use_cuda=use_gpu)
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/models/mwt/trainer.py", line 36, in __init__
    self.load(model_file, use_cuda)
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/models/mwt/trainer.py", line 141, in load
    checkpoint = torch.load(filename, lambda storage, loc: storage)
  File "/usr/lib/python3/dist-packages/torch/serialization.py", line 791, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/lib/python3/dist-packages/torch/serialization.py", line 271, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/lib/python3/dist-packages/torch/serialization.py", line 252, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/mwt/imst.pt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dyudelevich/.local/lib/python3.10/site-packages/stanza/pipeline/core.py", line 155, in __init__
    raise FileNotFoundError('Could not find model file %s, although there are other models downloaded for language %s.  Perhaps you need to download a specific model.  Try: stanza.download(lang="%s",package=None,processors={"%s":"%s"})' % (model_path, lang, lang, processor_name, model_name)) from e
FileNotFoundError: Could not find model file /home/dyudelevich/.local/share/argos-translate/packages/translate-az_en-1_5/stanza/az/mwt/imst.pt, although there are other models downloaded for language az.  Perhaps you need to download a specific model.  Try: stanza.download(lang="az",package=None,processors={"mwt":"imst"})

Thanks for the detailed report. I just tested Stanza version 1.8.1 for Albanian myself and it is broken.

Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway.
2024-03-17 07:26:29 WARNING: Language en package default expects mwt, which has been added
2024-03-17 07:26:33 WARNING: Language en package default expects mwt, which has been added
('No translation available for this language pair',)
2024-03-17 07:27:18 WARNING: Language en package default expects mwt, which has been added
('No translation available for this language pair',)
2024-03-17 07:27:42 WARNING: Unsupported language: sq.
Traceback (most recent call last):
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslategui/gui.py", line 39, in run
    translated_text = self.translation_function()
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 63, in translate
    return self.hypotheses(input_text, num_hypotheses=1)[0].value
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 296, in hypotheses
    translated_paragraph = self.underlying.hypotheses(
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 173, in hypotheses
    apply_packaged_translation(
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/argostranslate/translate.py", line 418, in apply_packaged_translation
    stanza_pipeline = stanza.Pipeline(
  File "/home/pj/Downloads/env/lib/python3.10/site-packages/stanza/pipeline/core.py", line 264, in __init__
    raise ValueError(f'No processors to load for language {lang}.  Language {lang} is currently unsupported')
ValueError: No processors to load for language sq.  Language sq is currently unsupported
Aborted (core dumped)

I just released Argos Translate v1.9.3 which pins the Stanza version at 1.1.1.