graykode / xlnet-Pytorch

Simple XLNet implementation with Pytorch Wrapper

Home Page:https://arxiv.org/pdf/1906.08237.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parameter initialized with torch.randn may be not a good choice

lddsdu opened this issue · comments

It seems that the parameter initialized with randn (

self.q_proj_weight = nn.Parameter(torch.randn(self.d_model,
) will lead to low-performance, and I tried xavier_norm and kaiming_uniform, both reach a much higher AUC and F1 score in my task.