Vaibhavs10 / insanely-fast-whisper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pipx install insanely-fast-whisper outdated version

LaansDole opened this issue · comments

https://pypi.org/project/insanely-fast-whisper/

According to the insanely-fast-whisper package on PyPi, it is still version 0.0.13 from Dec 15, 2023

As a result, when I run

pipx run insanely-fast-whisper --file-name

It is still getting the outdate version compared with the current repository. I wonder if this would be an issue?

I have proceeded to resolve this with following commands on notebook

!pip install git+https://github.com/Vaibhavs10/insanely-fast-whisper.git

For example, this is a Python script to run the command:

import subprocess

def run_cli():
    # Define the command as a string
    command = "insanely-fast-whisper -h"

    # Use subprocess to run the command
    process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    # Get the output and error messages, if any
    stdout, stderr = process.communicate()

    # Decode the output and error messages from bytes to string
    stdout = stdout.decode()
    stderr = stderr.decode()

    # Print the output and error messages
    print("Output:", stdout)
    if stderr:
        print("Error:", stderr)

# Call the function to run the CLI
run_cli()
Output: usage: insanely-fast-whisper [-h] --file-name FILE_NAME [--device-id DEVICE_ID]
                             [--transcript-path TRANSCRIPT_PATH] [--model-name MODEL_NAME]
                             [--task {transcribe,translate}] [--language LANGUAGE]
                             [--batch-size BATCH_SIZE] [--flash FLASH] [--timestamp {chunk,word}]
                             [--hf-token HF_TOKEN] [--diarization_model DIARIZATION_MODEL]
                             [--num-speakers NUM_SPEAKERS] [--min-speakers MIN_SPEAKERS]
                             [--max-speakers MAX_SPEAKERS]

Automatic Speech Recognition

options:
  -h, --help            show this help message and exit
  --file-name FILE_NAME
                        Path or URL to the audio file to be transcribed.
  --device-id DEVICE_ID
                        Device ID for your GPU. Just pass the device number when using CUDA, or
                        "mps" for Macs with Apple Silicon. (default: "0")
  --transcript-path TRANSCRIPT_PATH
                        Path to save the transcription output. (default: output.json)
  --model-name MODEL_NAME
                        Name of the pretrained model/ checkpoint to perform ASR. (default:
                        openai/whisper-large-v3)
  --task {transcribe,translate}
                        Task to perform: transcribe or translate to another language. (default:
                        transcribe)
  --language LANGUAGE   Language of the input audio. (default: "None" (Whisper auto-detects the
                        language))
  --batch-size BATCH_SIZE
                        Number of parallel batches you want to compute. Reduce if you face OOMs.
                        (default: 24)
  --flash FLASH         Use Flash Attention 2. Read the FAQs to see how to install FA2 correctly.
                        (default: False)
  --timestamp {chunk,word}
                        Whisper supports both chunked as well as word level timestamps. (default:
                        chunk)
  --hf-token HF_TOKEN   Provide a hf.co/settings/token for Pyannote.audio to diarise the audio
                        clips
  --diarization_model DIARIZATION_MODEL
                        Name of the pretrained model/ checkpoint to perform diarization. (default:
                        pyannote/speaker-diarization)
  --num-speakers NUM_SPEAKERS
                        Specifies the exact number of speakers present in the audio file. Useful
                        when the exact number of participants in the conversation is known. Must
                        be at least 1. Cannot be used together with --min-speakers or --max-
                        speakers. (default: None)
  --min-speakers MIN_SPEAKERS
                        Sets the minimum number of speakers that the system should consider during
                        diarization. Must be at least 1. Cannot be used together with --num-
                        speakers. Must be less than or equal to --max-speakers if both are
                        specified. (default: None)
  --max-speakers MAX_SPEAKERS
                        Defines the maximum number of speakers that the system should consider in
                        diarization. Must be at least 1. Cannot be used together with --num-
                        speakers. Must be greater than or equal to --min-speakers if both are
                        specified. (default: None)

Error: 2024-04-22 08:51:36.159276: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-22 08:51:36.159329: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-22 08:51:36.255010: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-22 08:51:40.005314: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/usr/local/lib/python3.10/dist-packages/pyannote/audio/core/io.py:43: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")

Thanks! This appears to have solved my issue with CUDA and Torch.

I think the Windows version installed on my computer was 0.0.8