argmaxinc / WhisperKit

On-device Inference of Whisper Speech Recognition Models for Apple Silicon

Home Page:https://takeargmax.com/blog/whisperkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Speculative decoding support with Eager streaming mode

atiorh opened this issue · comments

The Eager streaming mode implies that we predict the same token at least twice. This is a great opportunity to design a speculative decoding technique that can leverage a fast draft model* and amortize the redundant predictions while accelerating the overall pipeline.

  • Draft: distil-large-v3, Oracle: large-v3. They share AudioEncoders, only TextDecoders are different