abdeladim-s / easymms

A simple Python package to easily use Meta's Massively Multilingual Speech (MMS) project

Home Page:https://abdeladim-s.github.io/easymms/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Module not found: No module named 'examples.speech_recognition'

andergisomon opened this issue · comments

commented

I was trying out the colab notebook, selected the l1107 model and changed the language to dtp and when running the ASR inference I got this error:

I have not tried this with the smaller model and lang='eng'

ModuleNotFoundError                       Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/easymms/models/asr.py](https://localhost:8080/#) in <module>
     28 try:
---> 29     from fairseq.examples.speech_recognition.new.infer import hydra_main
     30 except ImportError:
[/content/fairseq/examples/speech_recognition/__init__.py](https://localhost:8080/#) in <module>
----> 1 from . import criterions, models, tasks  # noqa

[/content/fairseq/examples/speech_recognition/criterions/__init__.py](https://localhost:8080/#) in <module>
     14         criterion_name = file[: file.find(".py")]
---> 15         importlib.import_module(
     16             "examples.speech_recognition.criterions." + criterion_name

[/usr/lib/python3.10/importlib/__init__.py](https://localhost:8080/#) in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

ModuleNotFoundError: No module named 'examples.speech_recognition'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-6-161905b8ad81>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from easymms.models.asr import ASRModel
      2 
      3 asr = ASRModel(model=f'./models/{model}.pt')
      4 
      5 transcriptions = asr.transcribe(files, lang='dtp', align=False)

[/usr/local/lib/python3.10/dist-packages/easymms/models/asr.py](https://localhost:8080/#) in <module>
     29     from fairseq.examples.speech_recognition.new.infer import hydra_main
     30 except ImportError:
---> 31     from examples.speech_recognition.new.infer import hydra_main
     32 
     33 

ModuleNotFoundError: No module named 'examples.speech_recognition'
commented

Error on line 29 of asr.py:

Import "fairseq.examples.speech_recognition.new.infer" could not be resolvedPyright(reportMissingImports)

@andergisomon, Did you get this error on colab, or you running it locally ?

commented

@andergisomon, Did you get this error on colab, or you running it locally ?

I have yet to try it locally, but I ran it on colab.

oh, was it on Colab? I have recently tested it and I didn't get any errors.
I will try to rerun it and fix it if needed.

@andergisomon, I have made some changes to the notebook to handle that error differently.
Could you please give it a try now ?

commented

@abdeladim-s Hello. I just tried the colab notebook and I'm surprised. It took 9 minutes (and 17GB of RAM) to transcribe a 2 second audio sample of my own voice, but the transcription was 100% accurate. Earlier I tried using ASR via huggingface transformers and while it took less than 8GB of RAM and ran faster the transcription was completely garbled.

There has to be something I'm missing about running the inference on the ASR model, but the current docs just don't go into the details enough.

commented

Is there a way to speed up the inference through EasyMMS? As in utilizing the GPU runtime as opposed to CPU.

Hi @andergisomon,

  • Yes you can speed up the inference by using the GPU instead of the CPU, to do this you need to select the GPU runtime and use device='cuda' in the transcribe function.
  • You can also speed up the inference by choosing a smaller model, but this depends on the target language you are using unfortunately.

That being said, transcribing a 2 seconds audio in 9 minutes seems weird! I tried it with even 30 s audio and it took just a few minutes.
If you can share the audio I can test it as well and I will let you know if I have any idea ?

commented

@abdeladim-s

What's weird was I tried it again with 16 seconds of audio and it took 11 minutes with the colab notebook you sent, only taking 3 minutes more than the 2 second audio sample. But I did use the l1107 model on the free CPU runtime with the language code changed. I didn't change anything else.

@andergisomon
I think the bottleneck is to load the model into memory, The 1107 model is 14G size so you will need at least this amount of memory before even doing any operation.
if you run the inference of that 16s audio after the 2s one, the second inference won't take much time because the model is already loaded into memory, that's why I made the files variable as a list, so we can run the inference on all the files before releasing the model.