Why are activation and dropout added after the classification layer?
MrInouye opened this issue · comments
MrInouye commented
In the xlnet code provided by transformer, why are activation and dropout added after the classification layer?
https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/xlnet/modeling_xlnet.py
https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/modeling_utils.py