ufal / whisper_streaming

Hi!

First of all, thank you very much for this repo, impressive work!

Issue

As I am trying to use Whisper for ASR on a robot, I want to use whisper_online.py as a module. In my main application, I tried to use
something like

import whisper
import whisper_timestamped
import whisper_streaming

...

sr = whisper_online.WhisperTimestampedASR(lan = kwargs.get('lan'),
                                                       modelsize = kwargs.get('model'), 
                                                       cache_dir = kwargs.get('model_cache_dir'), 
                                                       model_dir = kwargs.get('model_dir')
                                                       )

while True:
    a = # receive new audio chunk (and e.g. wait for min_chunk_size seconds first, ...
    online.insert_audio_chunk(a)
    o = online.process_iter()
    print(o) # do something with current partial output

but I get a ModuleNotFoundError: No module named 'whisper'. I am sure that whisper is installed as running whisper_online.py as main works smoothly. Moreover, I have run import whisper in a Python shell, and it throws no error.

Solution(s)

placing the import statements at the top of the whisper_online script works, but indeed the possibility to import only the necessary backend is lost.
alternatively, I have tried to add an import_backend method in ASRBase, to import the required libraries. E.g. for WhisperTimestampedASR it would be:

def import_backend(self):
        global whisper, whisper_timestamped
        import whisper
        import whisper_timestamped

hi, thanks for feedback.

I primarily use faster-whisper backend, and I stopped testing debugging whisper backend, so I think I missed this. I think that a similar approach to this could help:

whisper_streaming/whisper_online.py

Line 93 in d649c4b

from faster_whisper import WhisperModel

In other words: Can you try adding `import whisper' into load_model of WhisperTimestampedASR? And then let's remove the comments "if used, requires import ..." It's wrong.

Thank you for the prompt answer!

whisper_streaming/whisper_online.py

Lines 49 to 65 in d649c4b

    
           class WhisperTimestampedASR(ASRBase): 
        
               """Uses whisper_timestamped library as the backend. Initially, we tested the code on this backend. It worked, but slower than faster-whisper. 
        
               On the other hand, the installation for GPU could be easier. 
        
               If used, requires imports: 
        
                   import whisper 
        
                   import whisper_timestamped 
        
               """ 
        
               def load_model(self, modelsize=None, cache_dir=None, model_dir=None): 
        
                   if model_dir is not None: 
        
                       print("ignoring model_dir, not implemented",file=sys.stderr) 
        
                   return whisper.load_model(modelsize, download_root=cache_dir) 
        
               def transcribe(self, audio, init_prompt=""): 
        
                   result = whisper_timestamped.transcribe_timestamped(self.model, audio, language=self.original_language, initial_prompt=init_prompt, verbose=None, condition_on_previous_text=True) 
        
                   return result

Indeed adding import whisper in load_model and import whisper_timestamped in transcribe works!

Though with a slightly different approach, I have proposed a solution with PR #10. If needed (and useful) I can modify it as you suggested.

Indeed adding import whisper in load_model

yes

and import whisper_timestamped in transcribe works!

Let's not import within transcribe. This function is called many times, it would be ineffective. Rather in load_model. It's fine because it's called only once per process, usually

and yes, I noticed your PR and gave feedback -- based on reading, not testing the code now. I'm busy

Let's not import within transcribe. This function is called many times, it would be ineffective. Rather in load_model. It's fine because it's called only once per process, usually

Ok, thank you for the advice.

and yes, I noticed your PR and gave feedback -- based on reading, not testing the code now. I'm busy

Thank you very much for your time!

	class WhisperTimestampedASR(ASRBase):
	"""Uses whisper_timestamped library as the backend. Initially, we tested the code on this backend. It worked, but slower than faster-whisper.
	On the other hand, the installation for GPU could be easier.

	If used, requires imports:
	import whisper
	import whisper_timestamped
	"""

	def load_model(self, modelsize=None, cache_dir=None, model_dir=None):
	if model_dir is not None:
	print("ignoring model_dir, not implemented",file=sys.stderr)
	return whisper.load_model(modelsize, download_root=cache_dir)

	def transcribe(self, audio, init_prompt=""):
	result = whisper_timestamped.transcribe_timestamped(self.model, audio, language=self.original_language, initial_prompt=init_prompt, verbose=None, condition_on_previous_text=True)
	return result

Import error when using `whisper_online` as module

Issue

Solution(s)