Only add a linear layer to LLaMA without any computation degrade the performance

Question

Only add a linear layer to LLaMA without any computation degrade the performance

YUCHEN005 opened this issue a year ago · comments

Hi authors, thank you for nices repos for llama tuning.

Currently i am using llama+lora tuning for text correction, it works well using the default lora tuning.

Now, I add a linear layer to the CausalSelfAttention in llama model. I only add a line self.proj = nn.Linear(x,x,bias=False), and then set it as trainable, without any computation in forward. As a result, the performance drops a lot.

I wonder why this happens since my added layer does not affect the forward propagation, and the model inialization is also unrelated to this new layer (i follow the norm initialization in original code).

I am new to lightning toolkit, could you please help analyze what problem it could be?