BriansIDP / WhisperBiasing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train_large.sh with --useGPT

JiajunHe1025 opened this issue · comments

Hello, I added --useGPT in train_large.sh, but I encountered an error when trying to train with Whisper Large and GPT-2 together. (but it seems normal in train.sh)
image
image

Could you tell me how to solve this problem? Thanks!!!!!

Hi. Thank you for your message. --useGPT option is not supported when you use the Whisper large model. Whisper models such as base.en and medium.en have the same set of tokens as GPT2, whereas Whisper large does not. Therefore, you will have indices that do not belong to the GPT2 token embedding matrices hence causing this index overflow error.

Hi. Thank you for your message. --useGPT option is not supported when you use the Whisper large model. Whisper models such as base.en and medium.en have the same set of tokens as GPT2, whereas Whisper large does not. Therefore, you will have indices that do not belong to the GPT2 token embedding matrices hence causing this index overflow error.

Thanks!!! So, could you tell me how the implementation of Whisper large + TCPGen + GPT-2, as mentioned in the paper, was done?
image

Hi. This was done by first generating the N-best list using Whisper large, and then rescoring them using GPT-2. In this process, you just treat each hypothesis as text and calculate language model scores using GPT-2. Then you add that language model scores to the Whisper output scores of each hypothesis (with an LM scaling factor of 0.05 in my case) and re-rank them to find the new best one.

Hi. This was done by first generating the N-best list using Whisper large, and then rescoring them using GPT-2. In this process, you just treat each hypothesis as text and calculate language model scores using GPT-2. Then you add that language model scores to the Whisper output scores of each hypothesis (with an LM scaling factor of 0.05 in my case) and re-rank them to find the new best one.

I see!!!!! Thank you very much!!!!!!!!!!!!!!!!!!!!!!