Possibly wrong initialization in LinearEmbeddings

Question

Possibly wrong initialization in LinearEmbeddings

oasidorshin opened this issue a year ago · comments

It seems that in LinearEmbeddings output size (-1 dim) is being used for kaiming, not input size, which is inconsistent with both Pytorch nn Linear initialization (https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) and other similar layers (e. g. NLinear)
https://github.com/Yura52/rtdl/blob/b354b35d68f28b4f5bbebd2e6d5b1f6cfa91eed1/rtdl/nn/_embeddings.py#L275

Yury Gorishniy · Answer 1 · Mon Nov 06 2023 22:35:40 GMT+0800 (China Standard Time)

(just in case, I will mention this warning from README: "Please note that the code in the "main" branch not released to PyPI is unstable and should not be used")

I see the following aspects to this issue:

Did you observe a suboptimal performance on your tasks caused by the current initialization scheme? What was the initialization scheme that fixed the issue for you?
I am not sure if we there is a direct analogy between LinearEmbeddings and torch.nn.Linear/NLinear. Overall, I should say that the current initialization scheme is not a result of systematic research, but rather a heuristic choice, that, so far, has worked well enough in the papers.

Oleg Sidorshin · Answer 2 · Tue Nov 07 2023 15:54:29 GMT+0800 (China Standard Time)

@Yura52
Thank you for the feedback!

No, with the current init I get very good results already. Havent tried other schemes
I was just under the impression that every linear-like layer is initialized like nn.Linear, but on the other thought it doesnt really make sense to do so - you would need to set input_dim as 1 for those layers, which would probably result in too high initialization.

I think that the current init works good, so closing the issue, but in the future the relationship between current init and kaiming in nn.Linear may be worth clarifying in the comments