About fine-tuning
zhouyan19 opened this issue · comments
ZhouYan commented
In the paper, Section 2.2 , you say "We combine those pretrained modules and finetune the whole model for ST". Did you freeze the Wav2Vec2.0 Model during training ? If not , I wonder if it's because of the mix-up training strategy , so as to bridge the modality gap.
Qingkai Fang commented
Thanks for your question. We did not freeze the Wav2vec2.0 module, so that the audio representation can be tuned with training.
ZhouYan commented
Thanks for your question. We did not freeze the Wav2vec2.0 module, so that the audio representation can be tuned with training.
Thank you !