Gadersd / whisper-burn

A Rust implementation of OpenAI's Whisper model using the burn framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: transcribing with medium model

b0xtch opened this issue · comments

OS: Mac Ventura

Seems like with the tiny model, transcription works, but when using the medium you get a buffer size error. Perhaps we could do chunking

     Running `target/release/whisper audio.wav medium`
thread 'main' panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_bind_group
    Buffer binding 0 range 212439040 exceeds `max_*_buffer_binding_size` limit 134217728

', /Users/botch/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.17.0/src/backend/direct.rs:3056:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::ops::function::Fn::call
   3: <wgpu::backend::direct::Context as wgpu::context::Context>::device_create_bind_group
   4: <T as wgpu::context::DynContext>::device_create_bind_group
   5: wgpu::Device::create_bind_group
   6: burn_wgpu::context::base::Context::execute
   7: burn_wgpu::kernel::index::select::select
   8: burn_tensor::tensor::ops::modules::base::ModuleOps::embedding
   9: whisper::model::Whisper<B>::forward_decoder
  10: whisper::main

Update

Using a six-minute audio file with the tiny model produces the same issue.

Chunking is the next planned feature. Right now it clips audio to around the first 30 seconds for the encoder, but the decoder sequence length isn't limited so it will overflow if it doesn't detect the end by the 30 second mark.

Rudimentary chunking is now implemented. Your long audio files should now work, although there is some minor transcription inaccuracy around the chunk edges. I tried incorporating the last few tokens from the previous chunk into whisper to remedy the chunk edge issues but then whisper severely repeats itself and stops predicting the end of chunks so I had to revoke that change. Any ideas why whisper is so finicky when exposed to tokens from the previous chunk?

Mint

whisper is so finicky when exposed to tokens from the previous chunk

sounds like Whisper hallucination it happens in other implementations as well. I would have to dig into this one...