Diart+Whisper for real time transcription not working as expected
star10RD opened this issue · comments
After follow your instructions on: Color Your Captions: Streamlining Live Transcriptions With “diart” and OpenAI’s Whisper I tried to run the program but I got the Invalid Sample Rate
error from PortAudio.
Then i saw #152 , and reinstalled Diart using the branch fix/samplerate
made the followiing changes to your above code:
from diart import SpeakerDiarization, SpeakerDiarizationConfig
config = SpeakerDiarizationConfig(
duration=5,
step=0.5,
latency="min",
tau_active=0.5,
rho_update=0.1,
delta_new=0.57,
device=torch.device("cuda")
)
dia = SpeakerDiarization(config)
source = MicrophoneAudioSource(device=8)
The error related to sample rate was gone however, the transcription is pure gibberish and does not even resemble the spoken words.
Can you help me out? I cant figure out what I am doing wrong.
Nevermind I made a small mistake here. I have got it working now:
source.stream.pipe(
# Format audio stream to sliding windows of 5s with a step of 500ms
- dops.rearrange_audio_stream( config.duration, config.step, config.sample_rate )
+ dops.rearrange_audio_stream( config.duration, config.step, source.sample_rate )
It's good news to see that the dynamic resampling is working :-)
These features will be included in the next release, happy to see they're useful before the official release!