ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BUG ImportError('libmosestokenizer-dev.so: cannot open shared object file: No such file or directory')

ivopivo opened this issue · comments

Hi after prolonged building of the required libs , I was finally able to run faster-whisper with GPU, but now I have this error about the opus-fast-mosestokenizer

When runing this command

python3 whisper_online.py out.wav --language en --model small --min-chunk-size 1 > out.txt

i get error about libmosestokenizer-dev.so

Audio duration is: 35.55 seconds
Loading Whisper small model for en... done. It took 5.62 seconds.
Traceback (most recent call last):
  File "/home/ivo/.local/lib/python3.8/site-packages/mosestokenizer/__init__.py", line 16, in <module>
    from mosestokenizer.lib import _mosestokenizer
ImportError: libmosestokenizer-dev.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "whisper_online.py", line 495, in <module>
    online = OnlineASRProcessor(asr,create_tokenizer(tgt_language))
  File "whisper_online.py", line 427, in create_tokenizer
    from mosestokenizer import MosesTokenizer
  File "/home/ivo/.local/lib/python3.8/site-packages/mosestokenizer/__init__.py", line 20, in <module>
    raise RuntimeError(_msg)
RuntimeError: Failed to import mosestokenizer c++ library
Full error log: ImportError('libmosestokenizer-dev.so: cannot open shared object file: No such file or directory')

I installed opus-fast-mosestokenizer with pip

pip install opus-fast-mosestokenizer

edit>

I tried and changed the tokenizer with sacremoses , in the line from sacremoses import MosesTokenizer, that got the transcription running! and showing words, even got processed the first sentence with time/delay report, but then , understandably failed as the tokenizer couldnt split(). My point is that in the infrastructure, the other stuff seems to work, just opus- cant get worked.

Hi,

  1. it seems it is your installation issue with opus-fast-mosestokenizer. Are you sure it's installed properly?

  2. https://github.com/hplt-project/sacremoses does not provide sentence segmentation, so this one can't work.

But yes, opus-fast-mosestokenizer can be replaced by any other sentence segmentation library. See #13 -- you can propose another one.

hi, I'm closing this because of inactivity. Feel free to reopen.

Thanks for the reply @Gldkslfmsd
you are right that the problem is within opus-fast-mosestokenizer, and not the whisper_streaming indeed. However there is no issue section in opus-fast-mosestokenizer project, so I put my issue here.
I installed opus-fast-mosestokenizer from pip, It reported succesful installation. But upon using it, it throws this error.
I also tried to build it from source, but no improvement.
the system I am using is
Jetson Orin Nano
Ubuntu 20.04
Cuda 11.4

It took efforts to build PyTorch and CTranslate2 for this system, but both of them are fnctioning within this environment. The problematic is opus-fast-mosestokenizer,

I cant reopen the issue as I am not collaborator. I just wanted to clarify the situation and the context.