is "tmp_weight" in transformer_layer.py useless?
zherowolf opened this issue · comments
zherowolf commented
great work!
I have two questions:
- is "tmp_weight" in transformer_layer.py useless? can I delete that?
- in the paper, you said wi is fixed when training, while in code I think it's trainable, am I right?
thx.
Liyuan Liu commented
Thanks for asking : -)
- Yes, its useless, you can delete that;
- I dont remember the paper said ω is fixed (
each layer has the flexibility to adjust ω and depends more on its residual branch
), it would be very helpful if you can point me to the part that confuses you. In the current implementatioin, ω is trainable (I dont think there is any reason to make it untrainable, the computation overhead is marginal at most). But I did some experiments with ω fixed, it leads to almost the same performance.
zherowolf commented
Thanks for quick reply and great work.
I will report experiment results with my dataset later.
Liyuan Liu commented
Sure no problem : -)