juanmc2005 / diart

A python package to build AI-powered real-time audio applications

Home Page:https://diart.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ImportError: cannot import name 'OnlineSpeakerDiarization' from 'diart'

ameer-kanaan opened this issue · comments

commented

I am trying to run your tutorial on transcription coloring. But I am getting the mentioned error.

The library runs fine per "diart.stream microphone".

Running on Windows 11 with Python 3.11.5. I am on your .yml environment.

They got rid of OnlineSpeakerDiarization, please use SpeakerDiarization and SpeakerDiarizationConfig instead.

I just updated the gist accordingly

commented

I just updated the gist accordingly

Thanks a lot.
It is running, but it is not transcribing, it just keeps "listening". Any workaround?

@ameer-kanaan some people have reported this but it could be due to many things, so I can't help without more information.

I suggest you debug line by line to find out what's going on.

You can also take a look at:

Some common problems are Whisper being too slow due to RAM and/or CPU requirements, and sample rate mismatch

commented

@ameer-kanaan some people have reported this but it could be due to many things, so I can't help without more information.

I suggest you debug line by line to find out what's going on.

You can also take a look at:

Some common problems are Whisper being too slow due to RAM and/or CPU requirements, and sample rate mismatch

Just tried running it on my friend's Mac Book Pro 2018 too, he gets the same issue.. it continues "listening" but doesn't output.

Additionally, we tried it on 2 other Windows devices, one of which has excellent GPU and CPU, but there it didn't listen, they just got the warnings and exited. We tried to make use of the issues section but for no avail.

We are all getting the same warnings:

UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.
Model was trained with pyannote.audio 0.0.1, yours is 3.1.0. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.8.1+cu102, yours is 2.1.1+cpu. Bad things might happen unless you revert torch to 1.x.

@ameer-kanaan the warnings are normal, you can simply ignore them.

Could you try downgrading diart to v0.8 and v0.7 and see if the issue persists? (keep in mind you'll have to change the class names again)

Are you trying to run the pipeline on your microphone or on a specific file?

Also, please try debugging line by line to see what the audio chunks are, what model outputs are, etc. This should give you an idea of what is happening

commented

@ameer-kanaan the warnings are normal, you can simply ignore them.

Could you try downgrading diart to v0.8 and v0.7 and see if the issue persists? (keep in mind you'll have to change the class names again)

Are you trying to run the pipeline on your microphone or on a specific file?

Also, please try debugging line by line to see what the audio chunks are, what model outputs are, etc. This should give you an idea of what is happening

It started working with 0.7 now. We are trying to use the microphone.

For the previous issue with 0.8, we did try to run basic debugging line by line, but we didn't catch any errors.

@ameer-kanaan ok, I'll try to take a look at this issue in the coming weeks to see if something broke the whisper code with v0.8

Update (copied from my comment on the gist):

I tried it out using diart 0.9, both from the mic and from an audio file, with and without GPU. Each time I was able to see colored transcriptions. However, what may be happening is that the chunk processing is too slow (due to hardware) and hence interrupts the recording of the microphone (although it should be asynchronous with MicrophoneAudioSource).

If you can get real time diarization with only diart (quick test diart.stream microphone and see what you get), then what I suggest is that you change line 151 to source = WebSocketAudioSource(config.sample_rate) and run the script, then from another terminal run diart.client microphone --host 127.0.0.1 --port 7007 --sample-rate 16000 --step 0.5.

This is basically reading from the microphone and sending chunks to the pipeline through a websocket server, then you should see the colored captions on the pipeline script's output.
This will guarantee that the mic streaming and the pipeline run in different processes, avoiding the interference problem that I mentioned.

Let me know if that works out!