thevasudevgupta / gsoc-wav2vec2

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Home Page:https://thevasudevgupta.github.io/gsoc-wav2vec2/assets/final_report

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Port original fine-tuned checkpoint to TFHub

thevasudevgupta opened this issue Β· comments

Hello @sayakpaul @MorganR,

1st checkpoint is up here: https://tfhub.dev/vasudevgupta7/wav2vec2/1 πŸŽ‰ πŸŽ‰

Now, I can transfer 2nd checkpoint to TFHub. It's the converted checkpoint which was fine-tuned on LibriSpeech dataset by Facebook (TensorFlow equivalent of this). I think, we can make changes to this notebook and link it with our 2nd checkpoint.

Please give your valuable suggestions/comments on that.

Congratulations, @vasudevgupta7  πŸ₯³ Be sure to update this in the README of this repository and other places you think are relevant.

Just a nit: why is the architecture field blank?

Screenshot 2021-07-29 at 7 13 36 AM

@MorganR is this a known bug?

Now, I can transfer 2nd checkpoint to TFHub. It's the converted checkpoint which was fine-tuned on LibriSpeech dataset by Facebook (TensorFlow equivalent of this). I think, we can make changes to this notebook and link it with our 2nd checkpoint.

Sounds good. However, help me understand this. I see TensorFlow model weights here:

Screenshot 2021-07-29 at 7 16 58 AM

Is it equivalent to what you have used in your notebook? Also when the LibriSpeech fine-tuning would be complete (the one you are working on) should match the results that you have got in the notebook, right?

I am not sure about blank architecture field. Am I missing something in the PR or it's just a bug??

These TF weights (which I am planning to add to TFHub) are converted from pytorch_model.bin (from above screen shot) using this script and are used in many tests (see this). I have also used it in my notebook.

tf_model.h5 (in above screen shot) is different from our checkpoint (HuggingFace also recently added TF version of Wav2Vec2).

Yes, our fine-tuned checkpoint (once training is over) should give same results (ideally) compared to this checkpoint if trained exactly the same way done by Facebook. Do you think it's good idea to send "converted fine-tuned" checkpoint to TFHub as well as they came with the paper??

tf_model.h5 (in above screen shot) is different from our checkpoint (HuggingFace also recently added TF version of Wav2Vec2).

Could you elaborate a bit more on this as in what aspects they are different? Maybe I am missing out on something.

Do you think it's good idea to send "converted fine-tuned" checkpoint to TFHub as well as they came with the paper??

I think if our fine-tuned checkpoints produce similar results on the evaluation set, then it's fine to only export that to Hub.

tf_model.h5 (in above screen shot) is different from our checkpoint (HuggingFace also recently added TF version of Wav2Vec2).

Could you elaborate a bit more on this as in what aspects they are different? Maybe I am missing out on something.

HuggingFace also added TF Wav2Vec2 in Transformers, so they also converted Wav2Vec2 from their pytorch version to TF. So their converted model & mine converted model are completely equivalent in terms of outputs but my conversion script is very different from theirs. So layers naming, weights naming is different and their TF checkpoint doesn't work with my code.

Do you think it's good idea to send "converted fine-tuned" checkpoint to TFHub as well as they came with the paper??

I think if our fine-tuned checkpoints produce similar results on the evaluation set, then it's fine to only export that to Hub.

okay

HuggingFace also added TF Wav2Vec2 in Transformers, so they also converted Wav2Vec2 from their pytorch version to TF. So their converted model & mine converted model are completely equivalent in terms of outputs but my conversion script is very different from theirs. So layers naming, weights naming is different and their TF checkpoint doesn't work with my code.

Thanks for the clarification.