Srijith-rkr / Whispering-LLaMA

EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

what changes you make on whisper_openAI?

rrscholarship opened this issue · comments

Hi @Srijith-rkr , I saw you cloned whisper_openAI, not installing in, and I wonder what changes you made to this library? Also using Large-v2 lead to OOM on my machine (24gb VRAM), any advice?

is there pretrained weights for this repo?

Yes, there is weights for this repo as in https://github.com/Srijith-rkr/Whispering-LLaMA/blob/main/README.md#model-weights

changes you made to this library?

at the time we worked on this project, there is no beam searching algorithm and temp based decoding. I guess the modification might be on this part. @Srijith-rkr any thoughts?

In the paper, we generate multiple hypothesis in the from Whisper model to use as a prompt input to the LLM. We modified the beam search in the Whisper code to select the next token based on temperature sampling, to generate multiple candidates that do not capture the utterance very well. We do this to model a weak acoustic model for the LLM to improve upon.

You can get the Whisper model weights with just
import whisper
model = whisper.load_model("mention size")

When we wrote the paper, we did not have an instruction tuned model, so we used alpaca weights. And the weights of that model converted to lit-llama (the repo our code is built on) format is attached in https://huggingface.co/Srijith-rkr/Whispering-LLaMA

We also shared our dataset here

Regarding OOM for using Large-V2.
We also have a baseline using Whisper Tiny in the paper, but I don't think you will be able to finetune LLaMA 7B even then. We used 2 A100s (80GB VRAM)

You will be able to run LLaMA inference with 24GB VRAM with quantization.

I hope that help you. Feel free to reopen this if you have any questions.

Thank you for your reply, its really helpful @Srijith-rkr! One last question, I still quite confused how it alpaca work as "ASR selector" prompt, if there no instruction tuned model? Also I did not find prompt for "ASR selector" in code base, does it mean alpaca weight is already finetuned insturction as "ASR selector"?