Possibly wrong initialization in LinearEmbeddings
oasidorshin opened this issue · comments
It seems that in LinearEmbeddings output size (-1 dim) is being used for kaiming, not input size, which is inconsistent with both Pytorch nn Linear initialization (https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) and other similar layers (e. g. NLinear)
https://github.com/Yura52/rtdl/blob/b354b35d68f28b4f5bbebd2e6d5b1f6cfa91eed1/rtdl/nn/_embeddings.py#L275
(just in case, I will mention this warning from README: "Please note that the code in the "main" branch not released to PyPI is unstable and should not be used")
I see the following aspects to this issue:
- Did you observe a suboptimal performance on your tasks caused by the current initialization scheme? What was the initialization scheme that fixed the issue for you?
- I am not sure if we there is a direct analogy between
LinearEmbeddings
andtorch.nn.Linear
/NLinear
. Overall, I should say that the current initialization scheme is not a result of systematic research, but rather a heuristic choice, that, so far, has worked well enough in the papers.
@Yura52
Thank you for the feedback!
- No, with the current init I get very good results already. Havent tried other schemes
- I was just under the impression that every linear-like layer is initialized like nn.Linear, but on the other thought it doesnt really make sense to do so - you would need to set input_dim as 1 for those layers, which would probably result in too high initialization.
I think that the current init works good, so closing the issue, but in the future the relationship between current init and kaiming in nn.Linear may be worth clarifying in the comments