ImportError: cannot import name 'OnlineSpeakerDiarization' from 'diart'

Question

ImportError: cannot import name 'OnlineSpeakerDiarization' from 'diart'

ameer-kanaan opened this issue 6 months ago · comments

AmKa commented 6 months ago

I am trying to run your tutorial on transcription coloring. But I am getting the mentioned error.

The library runs fine per "diart.stream microphone".

Running on Windows 11 with Python 3.11.5. I am on your .yml environment.

thieugiactu · Answer 1 · Fri Nov 24 2023 14:57:39 GMT+0800 (China Standard Time)

They got rid of OnlineSpeakerDiarization, please use SpeakerDiarization and SpeakerDiarizationConfig instead.

Juan Coria · Answer 2 · Fri Nov 24 2023 17:52:51 GMT+0800 (China Standard Time)

I just updated the gist accordingly

AmKa · Answer 3 · Fri Nov 24 2023 22:11:17 GMT+0800 (China Standard Time)

I just updated the gist accordingly

Thanks a lot.
It is running, but it is not transcribing, it just keeps "listening". Any workaround?

Juan Coria · Answer 4 · Fri Nov 24 2023 22:38:58 GMT+0800 (China Standard Time)

@ameer-kanaan some people have reported this but it could be due to many things, so I can't help without more information.

I suggest you debug line by line to find out what's going on.

You can also take a look at:

Some common problems are Whisper being too slow due to RAM and/or CPU requirements, and sample rate mismatch

AmKa · Answer 5 · Mon Nov 27 2023 22:04:21 GMT+0800 (China Standard Time)

@ameer-kanaan some people have reported this but it could be due to many things, so I can't help without more information.

I suggest you debug line by line to find out what's going on.

You can also take a look at:

Running Diart_Whisper on Windows and nothing happens #203

Diart+Whisper for real time transcription not working as expected #161

gist comments

Some common problems are Whisper being too slow due to RAM and/or CPU requirements, and sample rate mismatch

Just tried running it on my friend's Mac Book Pro 2018 too, he gets the same issue.. it continues "listening" but doesn't output.

Additionally, we tried it on 2 other Windows devices, one of which has excellent GPU and CPU, but there it didn't listen, they just got the warnings and exited. We tried to make use of the issues section but for no avail.

We are all getting the same warnings:

UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.

Juan Coria · Answer 6 · Mon Nov 27 2023 22:39:34 GMT+0800 (China Standard Time)

@ameer-kanaan the warnings are normal, you can simply ignore them.

Could you try downgrading diart to v0.8 and v0.7 and see if the issue persists? (keep in mind you'll have to change the class names again)

Are you trying to run the pipeline on your microphone or on a specific file?

Also, please try debugging line by line to see what the audio chunks are, what model outputs are, etc. This should give you an idea of what is happening

AmKa · Answer 7 · Tue Nov 28 2023 22:18:31 GMT+0800 (China Standard Time)

@ameer-kanaan the warnings are normal, you can simply ignore them.

Could you try downgrading diart to v0.8 and v0.7 and see if the issue persists? (keep in mind you'll have to change the class names again)

Are you trying to run the pipeline on your microphone or on a specific file?

Also, please try debugging line by line to see what the audio chunks are, what model outputs are, etc. This should give you an idea of what is happening

It started working with 0.7 now. We are trying to use the microphone.

For the previous issue with 0.8, we did try to run basic debugging line by line, but we didn't catch any errors.

Juan Coria · Answer 8 · Tue Nov 28 2023 23:01:47 GMT+0800 (China Standard Time)

@ameer-kanaan ok, I'll try to take a look at this issue in the coming weeks to see if something broke the whisper code with v0.8

Juan Coria · Answer 9 · Tue Dec 12 2023 00:53:46 GMT+0800 (China Standard Time)

Update (copied from my comment on the gist):

I tried it out using diart 0.9, both from the mic and from an audio file, with and without GPU. Each time I was able to see colored transcriptions. However, what may be happening is that the chunk processing is too slow (due to hardware) and hence interrupts the recording of the microphone (although it should be asynchronous with MicrophoneAudioSource).

If you can get real time diarization with only diart (quick test diart.stream microphone and see what you get), then what I suggest is that you change line 151 to source = WebSocketAudioSource(config.sample_rate) and run the script, then from another terminal run diart.client microphone --host 127.0.0.1 --port 7007 --sample-rate 16000 --step 0.5.

This is basically reading from the microphone and sending chunks to the pipeline through a websocket server, then you should see the colored captions on the pipeline script's output.
This will guarantee that the mic streaming and the pipeline run in different processes, avoiding the interference problem that I mentioned.

Let me know if that works out!