lablab-ai / Whisper-transcription_and_diarization-speaker-identification-

How to use OpenAIs Whisper to transcribe and diarize audio files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Diarization is taking 90+ minutes, is that normal?

josh-may opened this issue · comments

I'm went through the repo and I go to this part:

DEMO_FILE = {'uri': 'blabal', 'audio': 'audio.wav'}
dz = pipeline(DEMO_FILE)  

with open("diarization.txt", "w") as text_file:
    text_file.write(str(dz))

But it's now been running for 65+ minutes. And this is for the 20 min audio file mentioned in the repo.

Screenshot at Dec 09 05-54-08

How long should the diarization take?

Hi, yes having the same problem. It's at 1hr 4mins on the runtime for a 20 min file

commented

I was able to finish the Diarization in 3 minutes using Google Collab with GPU execution for a 55 minute audio file

I plugged into a M60 nvidia on azure ML workspaces through VS code and was able to still utilize github copilot and stay in my IDE and it finished an hour episode of a podcast in 10m

commented

like most people here said, it depends on the length of the audio file, your hardware and on the size of the Whisper model you choose.

I have RTX 3060, after downloading and installing CUDA, it finished the processing in around 17 minutes. Before that, it took forever and I gave up in the end.
If you have a CUDA-capable GPU, you can follow the guide below to install the CUDA version of PyTorch. It does make a lot of difference.
https://pytorch.org/get-started/locally/