bnosac / audio.whisper

Transcribe audio files using the "Whisper" Automatic Speech Recognition model from R

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

URLs hard-coded in the model download package are incorrect

prolabrus opened this issue · comments

Please update the links - all URLs hard-coded in the model download package are incorrect.

This was coming from: https://stackoverflow.com/questions/76655175/r-installation-of-package-audio-whisper-0-2-1-tar-gz-has-a-non-zero-exit

If looks like when you use whisper("large") it tried to pull the file from
https://huggingface.co/datasets/ggerganov/whisper.cpp/resolve/main/ggml-large.bin but that URL seems to require a username/password. Instead if you used https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-large.bin (without the /datasets/) it seems you can download the file.

And when using whisper("large", repos="ggerganov") it tires to pull the model from https://ggml.ggerganov.com/ggml-model-whisper-large.bin but the URL seems to be https://ggml.ggerganov.com/ggml-model-whisper-large-q5_0.bin (it has the q5_0 part in it now).

@prolabrus You can download the models from https://huggingface.co/ggerganov/whisper.cpp/commit/80da2d8bfee42b0e836fc3a9890373e5defc00a6 (click Browse Files) and next pass the path to the file where you stored the model to the function whisper.

library(audio.whisper)
model <- whisper("inst/model-repository/ggml-small.bin")
whisper_init_from_file: loading model from 'inst/model-repository/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 768
whisper_model_load: n_text_head = 12
whisper_model_load: n_text_layer = 12
whisper_model_load: n_mels = 80
whisper_model_load: f16 = 1
whisper_model_load: type = 3
whisper_model_load: mem required = 603.00 MB (+ 16.00 MB per decoder)
whisper_model_load: kv self size = 15.75 MB
whisper_model_load: kv cross size = 52.73 MB
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 464.56 MB

@MrFlick I don't known if the newer models are compatible with version 0.2.1 of this R package which is built on top of the Stable whisper.cpp version v1.2.1 (v1.4.2 is still beta)
I'll upgrade the bindings once tinydiarize is incorporated in whisper.cpp

I've updated the URL's for repos = "huggingface" and deprecated the repos = "ggerganov" as some models are no longer available at this last location.
This should have fixed this issue.