juanmc2005 / diart

A python package to build AI-powered real-time audio applications

Home Page:https://diart.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diart+Whisper for real time transcription not working as expected

star10RD opened this issue · comments

After follow your instructions on: Color Your Captions: Streamlining Live Transcriptions With “diart” and OpenAI’s Whisper I tried to run the program but I got the Invalid Sample Rate error from PortAudio.

Then i saw #152 , and reinstalled Diart using the branch fix/samplerate made the followiing changes to your above code:

from diart import SpeakerDiarization, SpeakerDiarizationConfig
config = SpeakerDiarizationConfig(
    duration=5,
    step=0.5,
    latency="min",
    tau_active=0.5,
    rho_update=0.1,
    delta_new=0.57,
    device=torch.device("cuda")
)
dia = SpeakerDiarization(config)
source = MicrophoneAudioSource(device=8)

The error related to sample rate was gone however, the transcription is pure gibberish and does not even resemble the spoken words.
Snapshot_2023-07-08_17-14-08

Can you help me out? I cant figure out what I am doing wrong.

Full code used

Nevermind I made a small mistake here. I have got it working now:

  source.stream.pipe(
      # Format audio stream to sliding windows of 5s with a step of 500ms
-      dops.rearrange_audio_stream( config.duration, config.step, config.sample_rate )
+      dops.rearrange_audio_stream( config.duration, config.step, source.sample_rate )

It's good news to see that the dynamic resampling is working :-)
These features will be included in the next release, happy to see they're useful before the official release!