xenova / whisper-web

ML-powered speech recognition directly in your browser

Home Page:https://hf.co/spaces/Xenova/whisper-web

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large model

mootje2 opened this issue · comments

Thanks for the nice afford with this app, I was wondering if I could- use it with the large model because I can see that with the multilanguage the transscription the large model have much better results than the one you are using. I have the large model on my Ubuntu server and test it with Gradio it gives a much better transcription. The question is how to adjust the script the use the large model from my local server?. also I saw in your demo on hugging face there is a microphone I do miss it.
Thanks

The purpose of this project is to run whisper directly in your browser, instead of a local server, so, I won't be modifying it to support an external API. However, feel free to clone the repo yourself, then separating the frontend from the backend if you wish to reuse the user interface.

Hi @xenova,
We added it to the models list as 'Xenova/whisper-large': [1550]. I download the model, but I get the error "RangeError: offset is out of bounds" during the transcription phase. I get the same error on devices with these different RAMs. How can I operate the Large model?

whisper-web\src\components\AudioManager.tsx

    const models = {
        // Original checkpoints
        'Xenova/whisper-tiny': [41, 152],
        'Xenova/whisper-base': [77, 291],
        'Xenova/whisper-small': [249],
        'Xenova/whisper-medium': [776],
        'Xenova/whisper-large-v2': [23776],
        'Xenova/whisper-large-v3': [17776],

        // Distil Whisper (English-only)
        'distil-whisper/distil-medium.en': [402],
        'distil-whisper/distil-large-v2': [767],
    };