is "tmp_weight" in transformer_layer.py useless?

Question

zherowolf opened this issue 4 years ago · comments

great work!
I have two questions:

is "tmp_weight" in transformer_layer.py useless? can I delete that?
in the paper, you said wi is fixed when training, while in code I think it's trainable, am I right?

thx.

Liyuan Liu · Answer 1 · Fri Oct 16 2020 00:53:39 GMT+0800 (China Standard Time)

Thanks for asking : -)

Yes, its useless, you can delete that;
I dont remember the paper said ω is fixed ( each layer has the flexibility to adjust ω and depends more on its residual branch), it would be very helpful if you can point me to the part that confuses you. In the current implementatioin, ω is trainable (I dont think there is any reason to make it untrainable, the computation overhead is marginal at most). But I did some experiments with ω fixed, it leads to almost the same performance.

zherowolf · Answer 2 · Fri Oct 16 2020 08:47:25 GMT+0800 (China Standard Time)

Thanks for quick reply and great work.
I will report experiment results with my dataset later.

Liyuan Liu · Answer 3 · Sat Oct 17 2020 03:42:04 GMT+0800 (China Standard Time)

Sure no problem : -)