pipx install insanely-fast-whisper outdated version
LaansDole opened this issue · comments
Do Le Long An commented
https://pypi.org/project/insanely-fast-whisper/
According to the insanely-fast-whisper package on PyPi, it is still version 0.0.13 from Dec 15, 2023
As a result, when I run
pipx run insanely-fast-whisper --file-name
It is still getting the outdate version compared with the current repository. I wonder if this would be an issue?
Do Le Long An commented
I have proceeded to resolve this with following commands on notebook
!pip install git+https://github.com/Vaibhavs10/insanely-fast-whisper.git
For example, this is a Python script to run the command:
import subprocess
def run_cli():
# Define the command as a string
command = "insanely-fast-whisper -h"
# Use subprocess to run the command
process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Get the output and error messages, if any
stdout, stderr = process.communicate()
# Decode the output and error messages from bytes to string
stdout = stdout.decode()
stderr = stderr.decode()
# Print the output and error messages
print("Output:", stdout)
if stderr:
print("Error:", stderr)
# Call the function to run the CLI
run_cli()
Output: usage: insanely-fast-whisper [-h] --file-name FILE_NAME [--device-id DEVICE_ID]
[--transcript-path TRANSCRIPT_PATH] [--model-name MODEL_NAME]
[--task {transcribe,translate}] [--language LANGUAGE]
[--batch-size BATCH_SIZE] [--flash FLASH] [--timestamp {chunk,word}]
[--hf-token HF_TOKEN] [--diarization_model DIARIZATION_MODEL]
[--num-speakers NUM_SPEAKERS] [--min-speakers MIN_SPEAKERS]
[--max-speakers MAX_SPEAKERS]
Automatic Speech Recognition
options:
-h, --help show this help message and exit
--file-name FILE_NAME
Path or URL to the audio file to be transcribed.
--device-id DEVICE_ID
Device ID for your GPU. Just pass the device number when using CUDA, or
"mps" for Macs with Apple Silicon. (default: "0")
--transcript-path TRANSCRIPT_PATH
Path to save the transcription output. (default: output.json)
--model-name MODEL_NAME
Name of the pretrained model/ checkpoint to perform ASR. (default:
openai/whisper-large-v3)
--task {transcribe,translate}
Task to perform: transcribe or translate to another language. (default:
transcribe)
--language LANGUAGE Language of the input audio. (default: "None" (Whisper auto-detects the
language))
--batch-size BATCH_SIZE
Number of parallel batches you want to compute. Reduce if you face OOMs.
(default: 24)
--flash FLASH Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly.
(default: False)
--timestamp {chunk,word}
Whisper supports both chunked as well as word level timestamps. (default:
chunk)
--hf-token HF_TOKEN Provide a hf.co/settings/token for Pyannote.audio to diarise the audio
clips
--diarization_model DIARIZATION_MODEL
Name of the pretrained model/ checkpoint to perform diarization. (default:
pyannote/speaker-diarization)
--num-speakers NUM_SPEAKERS
Specifies the exact number of speakers present in the audio file. Useful
when the exact number of participants in the conversation is known. Must
be at least 1. Cannot be used together with --min-speakers or --max-
speakers. (default: None)
--min-speakers MIN_SPEAKERS
Sets the minimum number of speakers that the system should consider during
diarization. Must be at least 1. Cannot be used together with --num-
speakers. Must be less than or equal to --max-speakers if both are
specified. (default: None)
--max-speakers MAX_SPEAKERS
Defines the maximum number of speakers that the system should consider in
diarization. Must be at least 1. Cannot be used together with --num-
speakers. Must be greater than or equal to --min-speakers if both are
specified. (default: None)
Error: 2024-04-22 08:51:36.159276: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-22 08:51:36.159329: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-22 08:51:36.255010: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-22 08:51:40.005314: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
torchaudio.set_audio_backend("soundfile")
Isaack Rasmussen commented
Thanks! This appears to have solved my issue with CUDA and Torch.
I think the Windows version installed on my computer was 0.0.8