Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Only add a linear layer to LLaMA without any computation degrade the performance

YUCHEN005 opened this issue · comments

Hi authors, thank you for nices repos for llama tuning.

Currently i am using llama+lora tuning for text correction, it works well using the default lora tuning.

Now, I add a linear layer to the CausalSelfAttention in llama model. I only add a line self.proj = nn.Linear(x,x,bias=False), and then set it as trainable, without any computation in forward. As a result, the performance drops a lot.

I wonder why this happens since my added layer does not affect the forward propagation, and the model inialization is also unrelated to this new layer (i follow the norm initialization in original code).

I am new to lightning toolkit, could you please help analyze what problem it could be?