qiufengyuyi / bert-of-theseus-tf

tensorflow version of bert-of-theseus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is the gate same at each layer?

zhuango opened this issue · comments

Hi, thanks for the TF implementations for paper BERT-of-Theseus. I am reproducing the results of BERT-of-Theseus and I have a question about the gate implementations. From the logic here, it seems that the gates are the same at each successor layer for one minibatch, which means layers of the model are ALL from either original BERT or the successor. But the proposed BERT-of-Theseus states that just some of the original layers (not all, initially) are replaced with successor layers. Is it a bug or my misunderstanding? Thank you!

Hi, thanks for the TF implementations for paper BERT-of-Theseus. I am reproducing the results of BERT-of-Theseus and I have a question about the gate implementations. From the logic here, it seems that the gates are the same at each successor layer for one minibatch, which means layers of the model are ALL from either original BERT or the successor. But the proposed BERT-of-Theseus states that just some of the original layers (not all, initially) are replaced with successor layers. Is it a bug or my misunderstanding? Thank you!

yes,the best implementation is dynamically applying different gate prob for different data. but for the bad mechanism of tf.estimator ,it is really hard to do so. If you are using tf 2.0 or pytorch ,you can implement it as your method

Hi, thanks for the TF implementations for paper BERT-of-Theseus. I am reproducing the results of BERT-of-Theseus and I have a question about the gate implementations. From the logic here, it seems that the gates are the same at each successor layer for one minibatch, which means layers of the model are ALL from either original BERT or the successor. But the proposed BERT-of-Theseus states that just some of the original layers (not all, initially) are replaced with successor layers. Is it a bug or my misunderstanding? Thank you!

sorry , i have misunderstood your question. my bad!!!
you're right ,i have upload the wrong code for using same gate for all layers. i have modified the code and commit on the master branch. @zhuango

Thanks for addressing my question. Closing this issue.