Diart+Whisper for real time transcription not working as expected

Question

Diart+Whisper for real time transcription not working as expected

star10RD opened this issue a year ago · comments

After follow your instructions on: Color Your Captions: Streamlining Live Transcriptions With “diart” and OpenAI’s Whisper I tried to run the program but I got the Invalid Sample Rate error from PortAudio.

Then i saw #152 , and reinstalled Diart using the branch fix/samplerate made the followiing changes to your above code:

from diart import SpeakerDiarization, SpeakerDiarizationConfig

config = SpeakerDiarizationConfig(
    duration=5,
    step=0.5,
    latency="min",
    tau_active=0.5,
    rho_update=0.1,
    delta_new=0.57,
    device=torch.device("cuda")
)
dia = SpeakerDiarization(config)
source = MicrophoneAudioSource(device=8)

The error related to sample rate was gone however, the transcription is pure gibberish and does not even resemble the spoken words.

Can you help me out? I cant figure out what I am doing wrong.

Full code used

Vedant Poddar · Answer 1 · Sun Jul 09 2023 20:59:19 GMT+0800 (China Standard Time)

Nevermind I made a small mistake here. I have got it working now:

  source.stream.pipe(
      # Format audio stream to sliding windows of 5s with a step of 500ms
-      dops.rearrange_audio_stream( config.duration, config.step, config.sample_rate )
+      dops.rearrange_audio_stream( config.duration, config.step, source.sample_rate )

Juan Coria · Answer 2 · Mon Jul 10 2023 18:15:33 GMT+0800 (China Standard Time)

It's good news to see that the dynamic resampling is working :-)
These features will be included in the next release, happy to see they're useful before the official release!